public inbox for pbs-devel@lists.proxmox.com
 help / color / mirror / Atom feed
* [pbs-devel] [PATCH proxmox-backup 0/2] debug cli: improve output, optionally compare file content for `diff archive`
@ 2022-11-29 14:17 Lukas Wagner
  2022-11-29 14:17 ` [pbs-devel] [PATCH proxmox-backup 1/2] debug cli: show more file attributes for `diff archive` command Lukas Wagner
                   ` (3 more replies)
  0 siblings, 4 replies; 9+ messages in thread
From: Lukas Wagner @ 2022-11-29 14:17 UTC (permalink / raw)
  To: pbs-devel

This patch series contains a few improvements for the `diff archive` tool,
mainly based on Wolfgang's suggestions.

First, the output of is now much more detailed and shows
some relevant file attributes, including what has changed between
snapshots. Changed attributes are highlighted by a "*".

For instance:

$ proxmox-backup-debug diff archive ...
A  f   644  10045  10000     0 B  2022-11-28 13:44:51  add.txt
M  f   644  10045  10000     6 B *2022-11-28 13:45:05  content.txt
D  f   644  10045  10000     0 B  2022-11-28 13:17:09  deleted.txt
M  f   644  10045    *29     0 B  2022-11-28 13:16:20  gid.txt
M  f  *777  10045  10000     0 B  2022-11-28 13:42:47  mode.txt
M  f   644  10045  10000     0 B *2022-11-28 13:44:33  mtime.txt
M  f   644  10045  10000    *7 B *2022-11-28 13:44:59 *size.txt
M  f   644 *64045  10000     0 B  2022-11-28 13:16:18  uid.txt
M *f   644  10045  10000    10 B  2022-11-28 13:44:59  type.txt

The second commit introduces the possiblity to pass
the --compare-content flag to the tool. If the flag is passed,
the tool will compare the file content instead of relying on mtime
alone to detect modifications.

Lukas Wagner (2):
  debug cli: show more file attributes for `diff archive` command
  debug cli: add 'compare-content' flag to `diff archive` command

 src/bin/proxmox_backup_debug/diff.rs | 356 ++++++++++++++++++++++-----
 1 file changed, 299 insertions(+), 57 deletions(-)

-- 
2.30.2





^ permalink raw reply	[flat|nested] 9+ messages in thread

* [pbs-devel] [PATCH proxmox-backup 1/2] debug cli: show more file attributes for `diff archive` command
  2022-11-29 14:17 [pbs-devel] [PATCH proxmox-backup 0/2] debug cli: improve output, optionally compare file content for `diff archive` Lukas Wagner
@ 2022-11-29 14:17 ` Lukas Wagner
  2022-11-29 14:17 ` [pbs-devel] [PATCH proxmox-backup 2/2] debug cli: add 'compare-content' flag to " Lukas Wagner
                   ` (2 subsequent siblings)
  3 siblings, 0 replies; 9+ messages in thread
From: Lukas Wagner @ 2022-11-29 14:17 UTC (permalink / raw)
  To: pbs-devel

This commit enriches the output of the `diff archive` command,
showing pxar entry type, mode, uid, gid, size, mtime and filename.
Attributes that changed between both snapshots are prefixed
with a "*".

For instance:

$ proxmox-backup-debug diff archive ...
A  f   644  10045  10000    0 B  2022-11-28 13:44:51  add.txt
M  f   644  10045  10000    6 B *2022-11-28 13:45:05  content.txt
D  f   644  10045  10000    0 B  2022-11-28 13:17:09  deleted.txt
M  f   644  10045    *29    0 B  2022-11-28 13:16:20  gid.txt
M  f  *777  10045  10000    0 B  2022-11-28 13:42:47  mode.txt
M  f   644  10045  10000    0 B *2022-11-28 13:44:33  mtime.txt
M  f   644  10045  10000   *7 B *2022-11-28 13:44:59 *size.txt
M  f   644 *64045  10000    0 B  2022-11-28 13:16:18  uid.txt
M *f   644  10045  10000   10 B  2022-11-28 13:44:59  type_changed.txt

Also, this commit ensures that we always show the *new* type.
Previously, the command showed the old type if it was changed.
Signed-off-by: Lukas Wagner <l.wagner@proxmox.com>
---
 src/bin/proxmox_backup_debug/diff.rs | 221 ++++++++++++++++++++-------
 1 file changed, 167 insertions(+), 54 deletions(-)

diff --git a/src/bin/proxmox_backup_debug/diff.rs b/src/bin/proxmox_backup_debug/diff.rs
index c304c110..d5a4a1fe 100644
--- a/src/bin/proxmox_backup_debug/diff.rs
+++ b/src/bin/proxmox_backup_debug/diff.rs
@@ -11,7 +11,7 @@ use futures::FutureExt;
 use proxmox_router::cli::{CliCommand, CliCommandMap, CommandLineInterface};
 use proxmox_schema::api;
 
-use pbs_api_types::{BackupNamespace, BackupPart};
+use pbs_api_types::{BackupNamespace, BackupPart, HumanByte};
 use pbs_client::tools::key_source::{
     crypto_parameters, format_key_source, get_encryption_key_password, KEYFD_SCHEMA,
 };
@@ -173,15 +173,18 @@ async fn diff_archive(
     // If file is present in both snapshots, it *might* be modified, but does not have to be.
     // If another, unmodified file resides in the same chunk as an actually modified one,
     // it will also show up as modified here...
-    let potentially_modified: HashMap<&OsStr, &FileEntry> = files_in_a
+    let potentially_modified: HashMap<&OsStr, (&FileEntry, &FileEntry)> = files_in_a
         .iter()
-        .filter(|(path, _)| files_in_b.contains_key(*path))
-        .map(|(path, entry)| (path.as_os_str(), entry))
+        .filter_map(|(path, entry_a)| {
+            files_in_b
+                .get(path)
+                .map(|entry_b| (path.as_os_str(), (entry_a, entry_b)))
+        })
         .collect();
 
     // ... so we compare the file metadata/contents to narrow the selection down to files
     // which where *really* modified.
-    let modified_files = compare_files(&files_in_a, &files_in_b, potentially_modified).await?;
+    let modified_files = compare_files(potentially_modified).await?;
 
     show_file_list(&added_files, &deleted_files, &modified_files);
 
@@ -335,7 +338,6 @@ fn visit_directory<'f, 'c>(
                 let digest = chunk_list.get(chunk_index).context("Invalid chunk index")?;
 
                 if chunk_diff.get(digest).is_some() {
-                    // files.insert(file_path.clone().into_os_string());
                     entries.insert(file_path.into_os_string(), entry);
                     break;
                 }
@@ -349,62 +351,177 @@ fn visit_directory<'f, 'c>(
 
 /// Check if files were actually modified
 async fn compare_files<'a>(
-    entries_a: &HashMap<OsString, FileEntry>,
-    entries_b: &HashMap<OsString, FileEntry>,
-    files: HashMap<&'a OsStr, &'a FileEntry>,
-) -> Result<HashMap<&'a OsStr, &'a FileEntry>, Error> {
+    files: HashMap<&'a OsStr, (&'a FileEntry, &'a FileEntry)>,
+) -> Result<HashMap<&'a OsStr, (&'a FileEntry, ChangedProperties)>, Error> {
     let mut modified_files = HashMap::new();
 
-    for (path, entry) in files {
-        let p = path.to_os_string();
-        let file_a = entries_a.get(&p).context("File entry not in map")?;
-        let file_b = entries_b.get(&p).context("File entry not in map")?;
-
-        if !compare_file(&file_a, &file_b).await {
-            modified_files.insert(path, entry);
+    for (path, (entry_a, entry_b)) in files {
+        if let Some(changed) = compare_file(entry_a, entry_b).await? {
+            modified_files.insert(path, (entry_b, changed));
         }
     }
 
     Ok(modified_files)
 }
 
-async fn compare_file(file_a: &FileEntry, file_b: &FileEntry) -> bool {
-    if file_a.metadata() != file_b.metadata() {
-        // Check if mtime, permissions, ACLs, etc. have changed - if they have changed, we consider
-        // the file as modified.
-        return false;
-    }
+async fn compare_file(
+    file_a: &FileEntry,
+    file_b: &FileEntry,
+) -> Result<Option<ChangedProperties>, Error> {
+    let mut changed = ChangedProperties::default();
+
+    changed.set_from_metadata(file_a, file_b);
 
     match (file_a.kind(), file_b.kind()) {
         (EntryKind::Symlink(a), EntryKind::Symlink(b)) => {
             // Check whether the link target has changed.
-            a.as_os_str() == b.as_os_str()
+            changed.content = a.as_os_str() != b.as_os_str();
         }
         (EntryKind::Hardlink(a), EntryKind::Hardlink(b)) => {
             // Check whether the link target has changed.
-            a.as_os_str() == b.as_os_str()
+            changed.content = a.as_os_str() != b.as_os_str();
+        }
+        (EntryKind::Device(a), EntryKind::Device(b)) => {
+            changed.content = a.major != b.major || a.minor != b.minor
         }
-        (EntryKind::Device(a), EntryKind::Device(b)) => a.major == b.major && a.minor == b.minor,
-        (EntryKind::Socket, EntryKind::Socket) => true,
-        (EntryKind::Fifo, EntryKind::Fifo) => true,
         (EntryKind::File { size: size_a, .. }, EntryKind::File { size: size_b, .. }) => {
-            // At this point we know that all metadata including mtime is
-            // the same. To speed things up, we consider the files as equal if they also have
-            // the same size.
-            // If one were completely paranoid, one could compare the actual file contents,
-            // but this decreases performance drastically.
-            size_a == size_b
+            if size_a != size_b {
+                changed.size = true;
+                changed.content = true;
+            };
+        }
+        (EntryKind::Directory, EntryKind::Directory) => {}
+        (EntryKind::Socket, EntryKind::Socket) => {}
+        (EntryKind::Fifo, EntryKind::Fifo) => {}
+        (_, _) => {
+            changed.entry_type = true;
         }
-        (EntryKind::Directory, EntryKind::Directory) => true,
-        (_, _) => false, // Kind has changed, so we of course consider it modified.
+    }
+
+    if changed.any() {
+        Ok(Some(changed))
+    } else {
+        Ok(None)
+    }
+}
+
+#[derive(Copy, Clone, Default)]
+struct ChangedProperties {
+    entry_type: bool,
+    mtime: bool,
+    acl: bool,
+    xattrs: bool,
+    fcaps: bool,
+    quota_project_id: bool,
+    mode: bool,
+    flags: bool,
+    uid: bool,
+    gid: bool,
+    size: bool,
+    content: bool,
+}
+
+impl ChangedProperties {
+    fn set_from_metadata(&mut self, file_a: &FileEntry, file_b: &FileEntry) {
+        let a = file_a.metadata();
+        let b = file_b.metadata();
+
+        self.acl = a.acl != b.acl;
+        self.xattrs = a.xattrs != b.xattrs;
+        self.fcaps = a.fcaps != b.fcaps;
+        self.quota_project_id = a.quota_project_id != b.quota_project_id;
+        self.mode = a.stat.mode != b.stat.mode;
+        self.flags = a.stat.flags != b.stat.flags;
+        self.uid = a.stat.uid != b.stat.uid;
+        self.gid = a.stat.gid != b.stat.gid;
+        self.mtime = a.stat.mtime != b.stat.mtime;
+    }
+
+    fn any(&self) -> bool {
+        self.entry_type
+            || self.mtime
+            || self.acl
+            || self.xattrs
+            || self.fcaps
+            || self.quota_project_id
+            || self.mode
+            || self.flags
+            || self.uid
+            || self.gid
+            || self.content
+    }
+}
+
+fn change_indicator(changed: bool) -> &'static str {
+    if changed {
+        "*"
+    } else {
+        " "
+    }
+}
+
+fn format_filesize(entry: &FileEntry, changed: bool) -> String {
+    if let Some(size) = entry.file_size() {
+        format!(
+            "{}{:.1}",
+            change_indicator(changed),
+            HumanByte::new_decimal(size as f64)
+        )
+    } else {
+        String::new()
     }
 }
 
+fn format_mtime(entry: &FileEntry, changed: bool) -> String {
+    let mtime = &entry.metadata().stat.mtime;
+
+    let format = if changed { "*%F %T" } else { " %F %T" };
+
+    proxmox_time::strftime_local(format, mtime.secs).unwrap_or_default()
+}
+
+fn format_mode(entry: &FileEntry, changed: bool) -> String {
+    let mode = entry.metadata().stat.mode & 0o7777;
+    format!("{}{:o}", change_indicator(changed), mode)
+}
+
+fn format_entry_type(entry: &FileEntry, changed: bool) -> String {
+    let kind = match entry.kind() {
+        EntryKind::Symlink(_) => "l",
+        EntryKind::Hardlink(_) => "h",
+        EntryKind::Device(_) if entry.metadata().stat.is_blockdev() => "b",
+        EntryKind::Device(_) => "c",
+        EntryKind::Socket => "s",
+        EntryKind::Fifo => "p",
+        EntryKind::File { .. } => "f",
+        EntryKind::Directory => "d",
+        _ => " ",
+    };
+
+    format!("{}{}", change_indicator(changed), kind)
+}
+
+fn format_uid(entry: &FileEntry, changed: bool) -> String {
+    format!("{}{}", change_indicator(changed), entry.metadata().stat.uid)
+}
+
+fn format_gid(entry: &FileEntry, changed: bool) -> String {
+    format!("{}{}", change_indicator(changed), entry.metadata().stat.gid)
+}
+
+fn format_file_name(entry: &FileEntry, changed: bool) -> String {
+    format!(
+        "{}{}",
+        change_indicator(changed),
+        entry.file_name().to_string_lossy()
+    )
+}
+
 /// Display a sorted list of added, modified, deleted files.
 fn show_file_list(
     added: &HashMap<&OsStr, &FileEntry>,
     deleted: &HashMap<&OsStr, &FileEntry>,
-    modified: &HashMap<&OsStr, &FileEntry>,
+    modified: &HashMap<&OsStr, (&FileEntry, ChangedProperties)>,
 ) {
     let mut all: Vec<&OsStr> = Vec::new();
 
@@ -415,28 +532,24 @@ fn show_file_list(
     all.sort();
 
     for file in all {
-        let (op, entry) = if let Some(entry) = added.get(file) {
-            ("A", *entry)
+        let (op, entry, changed) = if let Some(entry) = added.get(file) {
+            ("A", entry, ChangedProperties::default())
         } else if let Some(entry) = deleted.get(file) {
-            ("D", *entry)
-        } else if let Some(entry) = modified.get(file) {
-            ("M", *entry)
+            ("D", entry, ChangedProperties::default())
+        } else if let Some((entry, changed)) = modified.get(file) {
+            ("M", entry, *changed)
         } else {
             unreachable!();
         };
 
-        let entry_kind = match entry.kind() {
-            EntryKind::Symlink(_) => "l",
-            EntryKind::Hardlink(_) => "h",
-            EntryKind::Device(_) if entry.metadata().stat.is_blockdev() => "b",
-            EntryKind::Device(_) => "c",
-            EntryKind::Socket => "s",
-            EntryKind::Fifo => "p",
-            EntryKind::File { .. } => "f",
-            EntryKind::Directory => "d",
-            _ => " ",
-        };
+        let entry_type = format_entry_type(entry, changed.entry_type);
+        let uid = format_uid(entry, changed.uid);
+        let gid = format_gid(entry, changed.gid);
+        let mode = format_mode(entry, changed.mode);
+        let size = format_filesize(entry, changed.size);
+        let mtime = format_mtime(entry, changed.mtime);
+        let name = format_file_name(entry, changed.content);
 
-        println!("{} {} {}", op, entry_kind, file.to_string_lossy());
+        println!("{op} {entry_type:>2} {mode:>5} {uid:>6} {gid:>6} {size:>10} {mtime:11} {name}");
     }
 }
-- 
2.30.2





^ permalink raw reply	[flat|nested] 9+ messages in thread

* [pbs-devel] [PATCH proxmox-backup 2/2] debug cli: add 'compare-content' flag to `diff archive` command
  2022-11-29 14:17 [pbs-devel] [PATCH proxmox-backup 0/2] debug cli: improve output, optionally compare file content for `diff archive` Lukas Wagner
  2022-11-29 14:17 ` [pbs-devel] [PATCH proxmox-backup 1/2] debug cli: show more file attributes for `diff archive` command Lukas Wagner
@ 2022-11-29 14:17 ` Lukas Wagner
  2022-12-01 13:29   ` Stefan Hanreich
  2022-11-30 16:27 ` [pbs-devel] [PATCH proxmox-backup 0/2] debug cli: improve output, optionally compare file content for `diff archive` Thomas Lamprecht
  2022-12-01 13:29 ` Stefan Hanreich
  3 siblings, 1 reply; 9+ messages in thread
From: Lukas Wagner @ 2022-11-29 14:17 UTC (permalink / raw)
  To: pbs-devel

When --compare-content is set, the command will compare the
file content instead on relying on mtime to detect modified files.

Signed-off-by: Lukas Wagner <l.wagner@proxmox.com>
---
 src/bin/proxmox_backup_debug/diff.rs | 141 +++++++++++++++++++++++++--
 1 file changed, 135 insertions(+), 6 deletions(-)

diff --git a/src/bin/proxmox_backup_debug/diff.rs b/src/bin/proxmox_backup_debug/diff.rs
index d5a4a1fe..9b916760 100644
--- a/src/bin/proxmox_backup_debug/diff.rs
+++ b/src/bin/proxmox_backup_debug/diff.rs
@@ -28,12 +28,15 @@ use pbs_tools::json::required_string_param;
 use pxar::accessor::ReadAt;
 use pxar::EntryKind;
 use serde_json::Value;
+use tokio::io::AsyncReadExt;
 
 type ChunkDigest = [u8; 32];
 type FileEntry = pxar::accessor::aio::FileEntry<Arc<dyn ReadAt + Send + Sync>>;
 type Accessor = pxar::accessor::aio::Accessor<Arc<dyn ReadAt + Send + Sync>>;
 type Directory = pxar::accessor::aio::Directory<Arc<dyn ReadAt + Send + Sync>>;
 
+const BUFFERSIZE: usize = 1024;
+
 pub fn diff_commands() -> CommandLineInterface {
     let cmd_def = CliCommandMap::new().insert(
         "archive",
@@ -79,18 +82,30 @@ pub fn diff_commands() -> CommandLineInterface {
                 schema: KEYFD_SCHEMA,
                 optional: true,
             },
+            "compare-content": {
+                optional: true,
+                type: bool,
+                description: "Compare file content rather than solely relying on mtime for detecting modified files.",
+            },
         }
     }
 )]
 /// Diff an archive in two snapshots. The command will output a list of added, modified and deleted files.
-/// For modified files, only the file metadata (e.g. mtime, size, etc.) will be considered. The actual
-/// file contents will not be compared.
+/// For modified files, the file metadata (e.g. mode, uid, gid, size, etc.) will be considered. For detecting
+/// modification of file content, only mtime will be used by default. If the --compare-content flag is provided,
+/// mtime is ignored and file content will be compared.
 async fn diff_archive_cmd(param: Value) -> Result<(), Error> {
     let repo = extract_repository_from_value(&param)?;
     let snapshot_a = required_string_param(&param, "prev-snapshot")?;
     let snapshot_b = required_string_param(&param, "snapshot")?;
     let archive_name = required_string_param(&param, "archive-name")?;
 
+    let compare_contents = match param.get("compare-content") {
+        Some(Value::Bool(value)) => *value,
+        Some(_) => bail!("invalid flag for compare-content"),
+        None => false,
+    };
+
     let namespace = match param.get("ns") {
         Some(Value::String(ns)) => ns.parse()?,
         Some(_) => bail!("invalid namespace parameter"),
@@ -120,7 +135,14 @@ async fn diff_archive_cmd(param: Value) -> Result<(), Error> {
 
     if archive_name.ends_with(".pxar") {
         let file_name = format!("{}.didx", archive_name);
-        diff_archive(snapshot_a, snapshot_b, &file_name, &repo_params).await?;
+        diff_archive(
+            snapshot_a,
+            snapshot_b,
+            &file_name,
+            &repo_params,
+            compare_contents,
+        )
+        .await?;
     } else {
         bail!("Only .pxar files are supported");
     }
@@ -133,6 +155,7 @@ async fn diff_archive(
     snapshot_b: &str,
     file_name: &str,
     repo_params: &RepoParams,
+    compare_contents: bool,
 ) -> Result<(), Error> {
     let (index_a, accessor_a) = open_dynamic_index(snapshot_a, file_name, repo_params).await?;
     let (index_b, accessor_b) = open_dynamic_index(snapshot_b, file_name, repo_params).await?;
@@ -184,7 +207,7 @@ async fn diff_archive(
 
     // ... so we compare the file metadata/contents to narrow the selection down to files
     // which where *really* modified.
-    let modified_files = compare_files(potentially_modified).await?;
+    let modified_files = compare_files(potentially_modified, compare_contents).await?;
 
     show_file_list(&added_files, &deleted_files, &modified_files);
 
@@ -352,11 +375,12 @@ fn visit_directory<'f, 'c>(
 /// Check if files were actually modified
 async fn compare_files<'a>(
     files: HashMap<&'a OsStr, (&'a FileEntry, &'a FileEntry)>,
+    compare_contents: bool,
 ) -> Result<HashMap<&'a OsStr, (&'a FileEntry, ChangedProperties)>, Error> {
     let mut modified_files = HashMap::new();
 
     for (path, (entry_a, entry_b)) in files {
-        if let Some(changed) = compare_file(entry_a, entry_b).await? {
+        if let Some(changed) = compare_file(entry_a, entry_b, compare_contents).await? {
             modified_files.insert(path, (entry_b, changed));
         }
     }
@@ -367,6 +391,7 @@ async fn compare_files<'a>(
 async fn compare_file(
     file_a: &FileEntry,
     file_b: &FileEntry,
+    compare_contents: bool,
 ) -> Result<Option<ChangedProperties>, Error> {
     let mut changed = ChangedProperties::default();
 
@@ -388,6 +413,15 @@ async fn compare_file(
             if size_a != size_b {
                 changed.size = true;
                 changed.content = true;
+            } else if compare_contents {
+                let content_identical = compare_file_contents(file_a, file_b).await?;
+                if content_identical && !changed.any_without_mtime() {
+                    // If the content is identical and nothing, exluding mtime,
+                    // has changed, we don't consider the entry as modified.
+                    changed.mtime = false;
+                }
+
+                changed.content = !content_identical;
             };
         }
         (EntryKind::Directory, EntryKind::Directory) => {}
@@ -405,6 +439,44 @@ async fn compare_file(
     }
 }
 
+async fn compare_file_contents(file_a: &FileEntry, file_b: &FileEntry) -> Result<bool, Error> {
+    let mut contents_a = file_a.contents().await?;
+    let mut contents_b = file_b.contents().await?;
+
+    compare_readers(&mut contents_a, &mut contents_b).await
+}
+
+async fn compare_readers<T>(reader_a: &mut T, reader_b: &mut T) -> Result<bool, Error>
+where
+    T: AsyncReadExt + Unpin,
+{
+    let mut buf_a = vec![0u8; BUFFERSIZE];
+    let mut buf_b = vec![0u8; BUFFERSIZE];
+
+    loop {
+        // normally, a join! would be nice here, but this leads to
+        // some 'higher-order lifetime error' for which I'm not smart enough to solve.
+        // let (bytes_read_a, bytes_read_b) =
+        //     tokio::try_join!(reader_a.read(&mut buf_a), reader_b.read(&mut buf_b))?;
+        let bytes_read_a = reader_a.read(&mut buf_a).await?;
+        let bytes_read_b = reader_b.read(&mut buf_b).await?;
+
+        if bytes_read_a != bytes_read_b {
+            return Ok(false);
+        }
+
+        if bytes_read_a == 0 {
+            break;
+        }
+
+        if buf_a[..bytes_read_a] != buf_b[..bytes_read_b] {
+            return Ok(false);
+        }
+    }
+
+    Ok(true)
+}
+
 #[derive(Copy, Clone, Default)]
 struct ChangedProperties {
     entry_type: bool,
@@ -438,8 +510,11 @@ impl ChangedProperties {
     }
 
     fn any(&self) -> bool {
+        self.any_without_mtime() || self.mtime
+    }
+
+    fn any_without_mtime(&self) -> bool {
         self.entry_type
-            || self.mtime
             || self.acl
             || self.xattrs
             || self.fcaps
@@ -553,3 +628,57 @@ fn show_file_list(
         println!("{op} {entry_type:>2} {mode:>5} {uid:>6} {gid:>6} {size:>10} {mtime:11} {name}");
     }
 }
+
+#[cfg(test)]
+mod tests {
+    use super::*;
+
+    use std::{
+        io::Cursor,
+        pin::Pin,
+        task::{Context, Poll},
+    };
+    use tokio::io::AsyncRead;
+
+    struct MockedAsyncReader(Cursor<Vec<u8>>);
+
+    impl AsyncRead for MockedAsyncReader {
+        fn poll_read(
+            self: Pin<&mut Self>,
+            _cx: &mut Context<'_>,
+            read_buf: &mut tokio::io::ReadBuf<'_>,
+        ) -> std::task::Poll<std::io::Result<()>> {
+            let mut buf = vec![0u8; 100];
+
+            let res = std::io::Read::read(&mut self.get_mut().0, &mut buf);
+
+            if let Ok(bytes) = res {
+                read_buf.put_slice(&buf[..bytes]);
+            }
+
+            Poll::Ready(res.map(|_| ()))
+        }
+    }
+
+    #[test]
+    fn test_do_compare_file_contents() {
+        fn compare(a: Vec<u8>, b: Vec<u8>) -> Result<bool, Error> {
+            let mut mock_a = MockedAsyncReader(Cursor::new(a));
+            let mut mock_b = MockedAsyncReader(Cursor::new(b));
+
+            proxmox_async::runtime::block_on(compare_readers(&mut mock_a, &mut mock_b))
+        }
+
+        assert!(matches!(compare(vec![0; 15], vec![0; 15]), Ok(true)));
+        assert!(matches!(compare(vec![0; 15], vec![0; 14]), Ok(false)));
+        assert!(matches!(compare(vec![0; 15], vec![1; 15]), Ok(false)));
+
+        let mut buf = vec![1u8; 2 * BUFFERSIZE];
+        buf[BUFFERSIZE] = 0;
+        assert!(matches!(compare(vec![1u8; 2 * BUFFERSIZE], buf), Ok(false)));
+
+        let mut buf = vec![1u8; 2 * BUFFERSIZE];
+        buf[2 * BUFFERSIZE - 1] = 0;
+        assert!(matches!(compare(vec![1u8; 2 * BUFFERSIZE], buf), Ok(false)));
+    }
+}
-- 
2.30.2





^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [pbs-devel] [PATCH proxmox-backup 0/2] debug cli: improve output, optionally compare file content for `diff archive`
  2022-11-29 14:17 [pbs-devel] [PATCH proxmox-backup 0/2] debug cli: improve output, optionally compare file content for `diff archive` Lukas Wagner
  2022-11-29 14:17 ` [pbs-devel] [PATCH proxmox-backup 1/2] debug cli: show more file attributes for `diff archive` command Lukas Wagner
  2022-11-29 14:17 ` [pbs-devel] [PATCH proxmox-backup 2/2] debug cli: add 'compare-content' flag to " Lukas Wagner
@ 2022-11-30 16:27 ` Thomas Lamprecht
  2022-12-01  7:30   ` Lukas Wagner
  2022-12-01 13:29 ` Stefan Hanreich
  3 siblings, 1 reply; 9+ messages in thread
From: Thomas Lamprecht @ 2022-11-30 16:27 UTC (permalink / raw)
  To: Proxmox Backup Server development discussion, Lukas Wagner

Am 29/11/2022 um 15:17 schrieb Lukas Wagner:
> This patch series contains a few improvements for the `diff archive` tool,
> mainly based on Wolfgang's suggestions.
> 
> First, the output of is now much more detailed and shows
> some relevant file attributes, including what has changed between
> snapshots. Changed attributes are highlighted by a "*".
> 

nothing against this approach, but what about color coding?
(maybe additionally or as either/or?)

anyhow, I'll let wolfgang handle this one as he reviewed the original series too.




^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [pbs-devel] [PATCH proxmox-backup 0/2] debug cli: improve output, optionally compare file content for `diff archive`
  2022-11-30 16:27 ` [pbs-devel] [PATCH proxmox-backup 0/2] debug cli: improve output, optionally compare file content for `diff archive` Thomas Lamprecht
@ 2022-12-01  7:30   ` Lukas Wagner
  2022-12-01  8:02     ` Thomas Lamprecht
  0 siblings, 1 reply; 9+ messages in thread
From: Lukas Wagner @ 2022-12-01  7:30 UTC (permalink / raw)
  To: Thomas Lamprecht, Proxmox Backup Server development discussion


On 11/30/22 17:27, Thomas Lamprecht wrote:
> nothing against this approach, but what about color coding?
> (maybe additionally or as either/or?)
> 

That is already on my 'future improvements' list. My approach would probably
be to leave the current output format as-is, and then add coloring on top
as eye candy (on by default, disabled by flag/envvar or if stdout is not a tty)


-- 
Best Regards,
Lukas Wagner




^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [pbs-devel] [PATCH proxmox-backup 0/2] debug cli: improve output, optionally compare file content for `diff archive`
  2022-12-01  7:30   ` Lukas Wagner
@ 2022-12-01  8:02     ` Thomas Lamprecht
  0 siblings, 0 replies; 9+ messages in thread
From: Thomas Lamprecht @ 2022-12-01  8:02 UTC (permalink / raw)
  To: Lukas Wagner, Proxmox Backup Server development discussion

Am 01/12/2022 um 08:30 schrieb Lukas Wagner:
> 
> On 11/30/22 17:27, Thomas Lamprecht wrote:
>> nothing against this approach, but what about color coding?
>> (maybe additionally or as either/or?)
>>
> 
> That is already on my 'future improvements' list. My approach would probably
> be to leave the current output format as-is, and then add coloring on top
> as eye candy (on by default, disabled by flag/envvar or if stdout is not a tty)
> 
> 

Hmm, I could imagine that there might be some that would like to avoid a extra
* adding "noise" if I got all information from the color anyway, but can be always
improved later, if anybody requests it at all.

For the "should I color choice" the nicest and flexibel way that most CLI tools
adhere too is having tree level:

- always
- never
- auto (stdout on if tty)

As with "always" it allows one also for checking out bigger outputs in a pager with
color, e.g., by piping to `less -R`.




^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [pbs-devel] [PATCH proxmox-backup 2/2] debug cli: add 'compare-content' flag to `diff archive` command
  2022-11-29 14:17 ` [pbs-devel] [PATCH proxmox-backup 2/2] debug cli: add 'compare-content' flag to " Lukas Wagner
@ 2022-12-01 13:29   ` Stefan Hanreich
  2022-12-01 13:30     ` Stefan Hanreich
  0 siblings, 1 reply; 9+ messages in thread
From: Stefan Hanreich @ 2022-12-01 13:29 UTC (permalink / raw)
  To: pbs-devel



On 11/29/22 15:17, Lukas Wagner wrote:
> When --compare-content is set, the command will compare the
> file content instead on relying on mtime to detect modified files.
> 
> Signed-off-by: Lukas Wagner <l.wagner@proxmox.com>
> ---
>   src/bin/proxmox_backup_debug/diff.rs | 141 +++++++++++++++++++++++++--
>   1 file changed, 135 insertions(+), 6 deletions(-)
> 
> diff --git a/src/bin/proxmox_backup_debug/diff.rs b/src/bin/proxmox_backup_debug/diff.rs
> index d5a4a1fe..9b916760 100644
> --- a/src/bin/proxmox_backup_debug/diff.rs
> +++ b/src/bin/proxmox_backup_debug/diff.rs
> @@ -28,12 +28,15 @@ use pbs_tools::json::required_string_param;
>   use pxar::accessor::ReadAt;
>   use pxar::EntryKind;
>   use serde_json::Value;
> +use tokio::io::AsyncReadExt;
>   
>   type ChunkDigest = [u8; 32];
>   type FileEntry = pxar::accessor::aio::FileEntry<Arc<dyn ReadAt + Send + Sync>>;
>   type Accessor = pxar::accessor::aio::Accessor<Arc<dyn ReadAt + Send + Sync>>;
>   type Directory = pxar::accessor::aio::Directory<Arc<dyn ReadAt + Send + Sync>>;
>   
> +const BUFFERSIZE: usize = 1024;
> +
>   pub fn diff_commands() -> CommandLineInterface {
>       let cmd_def = CliCommandMap::new().insert(
>           "archive",
> @@ -79,18 +82,30 @@ pub fn diff_commands() -> CommandLineInterface {
>                   schema: KEYFD_SCHEMA,
>                   optional: true,
>               },
> +            "compare-content": {
> +                optional: true,
> +                type: bool,
> +                description: "Compare file content rather than solely relying on mtime for detecting modified files.",
> +            },
>           }
>       }
>   )]
>   /// Diff an archive in two snapshots. The command will output a list of added, modified and deleted files.
> -/// For modified files, only the file metadata (e.g. mtime, size, etc.) will be considered. The actual
> -/// file contents will not be compared.
> +/// For modified files, the file metadata (e.g. mode, uid, gid, size, etc.) will be considered. For detecting
> +/// modification of file content, only mtime will be used by default. If the --compare-content flag is provided,
> +/// mtime is ignored and file content will be compared.
>   async fn diff_archive_cmd(param: Value) -> Result<(), Error> {
>       let repo = extract_repository_from_value(&param)?;
>       let snapshot_a = required_string_param(&param, "prev-snapshot")?;
>       let snapshot_b = required_string_param(&param, "snapshot")?;
>       let archive_name = required_string_param(&param, "archive-name")?;
>   
> +    let compare_contents = match param.get("compare-content") {
> +        Some(Value::Bool(value)) => *value,
> +        Some(_) => bail!("invalid flag for compare-content"),
> +        None => false,
> +    };
> +
>       let namespace = match param.get("ns") {
>           Some(Value::String(ns)) => ns.parse()?,
>           Some(_) => bail!("invalid namespace parameter"),
> @@ -120,7 +135,14 @@ async fn diff_archive_cmd(param: Value) -> Result<(), Error> {
>   
>       if archive_name.ends_with(".pxar") {
>           let file_name = format!("{}.didx", archive_name);
> -        diff_archive(snapshot_a, snapshot_b, &file_name, &repo_params).await?;
> +        diff_archive(
> +            snapshot_a,
> +            snapshot_b,
> +            &file_name,
> +            &repo_params,
> +            compare_contents,
> +        )
> +        .await?;
>       } else {
>           bail!("Only .pxar files are supported");
>       }
> @@ -133,6 +155,7 @@ async fn diff_archive(
>       snapshot_b: &str,
>       file_name: &str,
>       repo_params: &RepoParams,
> +    compare_contents: bool,
>   ) -> Result<(), Error> {
>       let (index_a, accessor_a) = open_dynamic_index(snapshot_a, file_name, repo_params).await?;
>       let (index_b, accessor_b) = open_dynamic_index(snapshot_b, file_name, repo_params).await?;
> @@ -184,7 +207,7 @@ async fn diff_archive(
>   
>       // ... so we compare the file metadata/contents to narrow the selection down to files
>       // which where *really* modified.
> -    let modified_files = compare_files(potentially_modified).await?;
> +    let modified_files = compare_files(potentially_modified, compare_contents).await?;
>   
>       show_file_list(&added_files, &deleted_files, &modified_files);
>   
> @@ -352,11 +375,12 @@ fn visit_directory<'f, 'c>(
>   /// Check if files were actually modified
>   async fn compare_files<'a>(
>       files: HashMap<&'a OsStr, (&'a FileEntry, &'a FileEntry)>,
> +    compare_contents: bool,
>   ) -> Result<HashMap<&'a OsStr, (&'a FileEntry, ChangedProperties)>, Error> {
>       let mut modified_files = HashMap::new();
>   
>       for (path, (entry_a, entry_b)) in files {
> -        if let Some(changed) = compare_file(entry_a, entry_b).await? {
> +        if let Some(changed) = compare_file(entry_a, entry_b, compare_contents).await? {
>               modified_files.insert(path, (entry_b, changed));
>           }
>       }
> @@ -367,6 +391,7 @@ async fn compare_files<'a>(
>   async fn compare_file(
>       file_a: &FileEntry,
>       file_b: &FileEntry,
> +    compare_contents: bool,
>   ) -> Result<Option<ChangedProperties>, Error> {
>       let mut changed = ChangedProperties::default();
>   
> @@ -388,6 +413,15 @@ async fn compare_file(
>               if size_a != size_b {
>                   changed.size = true;
>                   changed.content = true;
> +            } else if compare_contents {
> +                let content_identical = compare_file_contents(file_a, file_b).await?;
> +                if content_identical && !changed.any_without_mtime() {
> +                    // If the content is identical and nothing, exluding mtime,
> +                    // has changed, we don't consider the entry as modified.
> +                    changed.mtime = false;
> +                }
> +
> +                changed.content = !content_identical;
>               };
>           }
>           (EntryKind::Directory, EntryKind::Directory) => {}
> @@ -405,6 +439,44 @@ async fn compare_file(
>       }
>   }
>   
> +async fn compare_file_contents(file_a: &FileEntry, file_b: &FileEntry) -> Result<bool, Error> {
> +    let mut contents_a = file_a.contents().await?;
> +    let mut contents_b = file_b.contents().await?;
> +
> +    compare_readers(&mut contents_a, &mut contents_b).await
> +}
> +
> +async fn compare_readers<T>(reader_a: &mut T, reader_b: &mut T) -> Result<bool, Error>
> +where
> +    T: AsyncReadExt + Unpin,
> +{
> +    let mut buf_a = vec![0u8; BUFFERSIZE];
> +    let mut buf_b = vec![0u8; BUFFERSIZE];
> +
> +    loop {
> +        // normally, a join! would be nice here, but this leads to
> +        // some 'higher-order lifetime error' for which I'm not smart enough to solve.
> +        // let (bytes_read_a, bytes_read_b) =
> +        //     tokio::try_join!(reader_a.read(&mut buf_a), reader_b.read(&mut buf_b))?;
> +        let bytes_read_a = reader_a.read(&mut buf_a).await?;
> +        let bytes_read_b = reader_b.read(&mut buf_b).await?;
> +
> +        if bytes_read_a != bytes_read_b {
> +            return Ok(false);
> +        }
> +
> +        if bytes_read_a == 0 {
> +            break;
> +        }
> +
> +        if buf_a[..bytes_read_a] != buf_b[..bytes_read_b] {
> +            return Ok(false);
> +        }
> +    }
> +
> +    Ok(true)
> +}
> +
>   #[derive(Copy, Clone, Default)]
>   struct ChangedProperties {
>       entry_type: bool,
> @@ -438,8 +510,11 @@ impl ChangedProperties {
>       }
>   
>       fn any(&self) -> bool {
> +        self.any_without_mtime() || self.mtime
> +    }
> +
> +    fn any_without_mtime(&self) -> bool {
>           self.entry_type
> -            || self.mtime
>               || self.acl
>               || self.xattrs
>               || self.fcaps
> @@ -553,3 +628,57 @@ fn show_file_list(
>           println!("{op} {entry_type:>2} {mode:>5} {uid:>6} {gid:>6} {size:>10} {mtime:11} {name}");
>       }
>   }
> +
> +#[cfg(test)]
> +mod tests {
> +    use super::*;
> +
> +    use std::{
> +        io::Cursor,
> +        pin::Pin,
> +        task::{Context, Poll},
> +    };
> +    use tokio::io::AsyncRead;
> +
> +    struct MockedAsyncReader(Cursor<Vec<u8>>);
> +
> +    impl AsyncRead for MockedAsyncReader {
> +        fn poll_read(
> +            self: Pin<&mut Self>,
> +            _cx: &mut Context<'_>,
> +            read_buf: &mut tokio::io::ReadBuf<'_>,
> +        ) -> std::task::Poll<std::io::Result<()>> {

very minor nit: Poll already gets imported above, so it could be removed

> +            let mut buf = vec![0u8; 100];
> +
> +            let res = std::io::Read::read(&mut self.get_mut().0, &mut buf);
> +
> +            if let Ok(bytes) = res {
> +                read_buf.put_slice(&buf[..bytes]);
> +            }
> +
> +            Poll::Ready(res.map(|_| ()))
> +        }
> +    }
> +
> +    #[test]
> +    fn test_do_compare_file_contents() {
> +        fn compare(a: Vec<u8>, b: Vec<u8>) -> Result<bool, Error> {
> +            let mut mock_a = MockedAsyncReader(Cursor::new(a));
> +            let mut mock_b = MockedAsyncReader(Cursor::new(b));
> +
> +            proxmox_async::runtime::block_on(compare_readers(&mut mock_a, &mut mock_b))
> +        }
> +
> +        assert!(matches!(compare(vec![0; 15], vec![0; 15]), Ok(true)));
> +        assert!(matches!(compare(vec![0; 15], vec![0; 14]), Ok(false)));
> +        assert!(matches!(compare(vec![0; 15], vec![1; 15]), Ok(false)));
> +
> +        let mut buf = vec![1u8; 2 * BUFFERSIZE];
> +        buf[BUFFERSIZE] = 0;
> +        assert!(matches!(compare(vec![1u8; 2 * BUFFERSIZE], buf), Ok(false)));
> +
> +        let mut buf = vec![1u8; 2 * BUFFERSIZE];
> +        buf[2 * BUFFERSIZE - 1] = 0;
> +        assert!(matches!(compare(vec![1u8; 2 * BUFFERSIZE], buf), Ok(false)));
> +    }
> +}




^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [pbs-devel] [PATCH proxmox-backup 0/2] debug cli: improve output, optionally compare file content for `diff archive`
  2022-11-29 14:17 [pbs-devel] [PATCH proxmox-backup 0/2] debug cli: improve output, optionally compare file content for `diff archive` Lukas Wagner
                   ` (2 preceding siblings ...)
  2022-11-30 16:27 ` [pbs-devel] [PATCH proxmox-backup 0/2] debug cli: improve output, optionally compare file content for `diff archive` Thomas Lamprecht
@ 2022-12-01 13:29 ` Stefan Hanreich
  3 siblings, 0 replies; 9+ messages in thread
From: Stefan Hanreich @ 2022-12-01 13:29 UTC (permalink / raw)
  To: Proxmox Backup Server development discussion, Lukas Wagner

[-- Attachment #1: Type: text/plain, Size: 2695 bytes --]

I reviewed this by generating 100 random files via the following command:

seq -w 1 100 | xargs -n1 -I% sh -c 'dd if=/dev/urandom of=file.% 
bs=$(shuf -i1-10 -n1) count=1024'

Then I created a backup from those files, recreated the files with the 
above command and created a second backup.

Additionally I created a 3rd backup where I:

  * changed uid & gid of 10 files
  * changed permissions of 10 files
  * deleted 10 files
  * created 10 new files
  * moved some of the files into a new subdirectory
  * touched 10 files

All changes between 1 ->2 and 2 -> 3 and 1 -> 3 were picked up by the 
diff tool as far as I could tell.

Some files where the content has changed but everything else stayed the 
same (except for mtime), were not marked as changed without the flag 
--compare-content. Maybe we should not mark any files as changed without 
the --compare-content flag or we should mark all files as changed where 
any attribute changed?

With --compare-content the changes were picked up, which is what this 
flag is for, so I think it's OK.


Code LGTM!


Reviewed-by: Stefan Hanreich <s.hanreich@proxmox.com>
Tested-by: Stefan Hanreich <s.hanreich@proxmox.com>

On 11/29/22 15:17, Lukas Wagner wrote:
> This patch series contains a few improvements for the `diff archive` tool,
> mainly based on Wolfgang's suggestions.
>
> First, the output of is now much more detailed and shows
> some relevant file attributes, including what has changed between
> snapshots. Changed attributes are highlighted by a "*".
>
> For instance:
>
> $ proxmox-backup-debug diff archive ...
> A  f   644  10045  10000     0 B  2022-11-28 13:44:51  add.txt
> M  f   644  10045  10000     6 B *2022-11-28 13:45:05  content.txt
> D  f   644  10045  10000     0 B  2022-11-28 13:17:09  deleted.txt
> M  f   644  10045    *29     0 B  2022-11-28 13:16:20  gid.txt
> M  f  *777  10045  10000     0 B  2022-11-28 13:42:47  mode.txt
> M  f   644  10045  10000     0 B *2022-11-28 13:44:33  mtime.txt
> M  f   644  10045  10000    *7 B *2022-11-28 13:44:59 *size.txt
> M  f   644 *64045  10000     0 B  2022-11-28 13:16:18  uid.txt
> M *f   644  10045  10000    10 B  2022-11-28 13:44:59  type.txt
>
> The second commit introduces the possiblity to pass
> the --compare-content flag to the tool. If the flag is passed,
> the tool will compare the file content instead of relying on mtime
> alone to detect modifications.
>
> Lukas Wagner (2):
>    debug cli: show more file attributes for `diff archive` command
>    debug cli: add 'compare-content' flag to `diff archive` command
>
>   src/bin/proxmox_backup_debug/diff.rs | 356 ++++++++++++++++++++++-----
>   1 file changed, 299 insertions(+), 57 deletions(-)
>

[-- Attachment #2: Type: text/html, Size: 3499 bytes --]

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [pbs-devel] [PATCH proxmox-backup 2/2] debug cli: add 'compare-content' flag to `diff archive` command
  2022-12-01 13:29   ` Stefan Hanreich
@ 2022-12-01 13:30     ` Stefan Hanreich
  0 siblings, 0 replies; 9+ messages in thread
From: Stefan Hanreich @ 2022-12-01 13:30 UTC (permalink / raw)
  To: pbs-devel

On 12/1/22 14:29, Stefan Hanreich wrote:
>
>
> On 11/29/22 15:17, Lukas Wagner wrote:
>> When --compare-content is set, the command will compare the
>> file content instead on relying on mtime to detect modified files.
>>
>> Signed-off-by: Lukas Wagner <l.wagner@proxmox.com>
>> ---
>>   src/bin/proxmox_backup_debug/diff.rs | 141 +++++++++++++++++++++++++--
>>   1 file changed, 135 insertions(+), 6 deletions(-)
>>
>> diff --git a/src/bin/proxmox_backup_debug/diff.rs 
>> b/src/bin/proxmox_backup_debug/diff.rs
>> index d5a4a1fe..9b916760 100644
>> --- a/src/bin/proxmox_backup_debug/diff.rs
>> +++ b/src/bin/proxmox_backup_debug/diff.rs
>> @@ -28,12 +28,15 @@ use pbs_tools::json::required_string_param;
>>   use pxar::accessor::ReadAt;
>>   use pxar::EntryKind;
>>   use serde_json::Value;
>> +use tokio::io::AsyncReadExt;
>>     type ChunkDigest = [u8; 32];
>>   type FileEntry = pxar::accessor::aio::FileEntry<Arc<dyn ReadAt + 
>> Send + Sync>>;
>>   type Accessor = pxar::accessor::aio::Accessor<Arc<dyn ReadAt + Send 
>> + Sync>>;
>>   type Directory = pxar::accessor::aio::Directory<Arc<dyn ReadAt + 
>> Send + Sync>>;
>>   +const BUFFERSIZE: usize = 1024;
>> +
>>   pub fn diff_commands() -> CommandLineInterface {
>>       let cmd_def = CliCommandMap::new().insert(
>>           "archive",
>> @@ -79,18 +82,30 @@ pub fn diff_commands() -> CommandLineInterface {
>>                   schema: KEYFD_SCHEMA,
>>                   optional: true,
>>               },
>> +            "compare-content": {
>> +                optional: true,
>> +                type: bool,
>> +                description: "Compare file content rather than 
>> solely relying on mtime for detecting modified files.",
>> +            },
>>           }
>>       }
>>   )]
>>   /// Diff an archive in two snapshots. The command will output a 
>> list of added, modified and deleted files.
>> -/// For modified files, only the file metadata (e.g. mtime, size, 
>> etc.) will be considered. The actual
>> -/// file contents will not be compared.
>> +/// For modified files, the file metadata (e.g. mode, uid, gid, 
>> size, etc.) will be considered. For detecting
>> +/// modification of file content, only mtime will be used by 
>> default. If the --compare-content flag is provided,
>> +/// mtime is ignored and file content will be compared.
>>   async fn diff_archive_cmd(param: Value) -> Result<(), Error> {
>>       let repo = extract_repository_from_value(&param)?;
>>       let snapshot_a = required_string_param(&param, "prev-snapshot")?;
>>       let snapshot_b = required_string_param(&param, "snapshot")?;
>>       let archive_name = required_string_param(&param, "archive-name")?;
>>   +    let compare_contents = match param.get("compare-content") {
>> +        Some(Value::Bool(value)) => *value,
>> +        Some(_) => bail!("invalid flag for compare-content"),
>> +        None => false,
>> +    };
>> +
>>       let namespace = match param.get("ns") {
>>           Some(Value::String(ns)) => ns.parse()?,
>>           Some(_) => bail!("invalid namespace parameter"),
>> @@ -120,7 +135,14 @@ async fn diff_archive_cmd(param: Value) -> 
>> Result<(), Error> {
>>         if archive_name.ends_with(".pxar") {
>>           let file_name = format!("{}.didx", archive_name);
>> -        diff_archive(snapshot_a, snapshot_b, &file_name, 
>> &repo_params).await?;
>> +        diff_archive(
>> +            snapshot_a,
>> +            snapshot_b,
>> +            &file_name,
>> +            &repo_params,
>> +            compare_contents,
>> +        )
>> +        .await?;
>>       } else {
>>           bail!("Only .pxar files are supported");
>>       }
>> @@ -133,6 +155,7 @@ async fn diff_archive(
>>       snapshot_b: &str,
>>       file_name: &str,
>>       repo_params: &RepoParams,
>> +    compare_contents: bool,
>>   ) -> Result<(), Error> {
>>       let (index_a, accessor_a) = open_dynamic_index(snapshot_a, 
>> file_name, repo_params).await?;
>>       let (index_b, accessor_b) = open_dynamic_index(snapshot_b, 
>> file_name, repo_params).await?;
>> @@ -184,7 +207,7 @@ async fn diff_archive(
>>         // ... so we compare the file metadata/contents to narrow the 
>> selection down to files
>>       // which where *really* modified.
>> -    let modified_files = compare_files(potentially_modified).await?;
>> +    let modified_files = compare_files(potentially_modified, 
>> compare_contents).await?;
>>         show_file_list(&added_files, &deleted_files, &modified_files);
>>   @@ -352,11 +375,12 @@ fn visit_directory<'f, 'c>(
>>   /// Check if files were actually modified
>>   async fn compare_files<'a>(
>>       files: HashMap<&'a OsStr, (&'a FileEntry, &'a FileEntry)>,
>> +    compare_contents: bool,
>>   ) -> Result<HashMap<&'a OsStr, (&'a FileEntry, ChangedProperties)>, 
>> Error> {
>>       let mut modified_files = HashMap::new();
>>         for (path, (entry_a, entry_b)) in files {
>> -        if let Some(changed) = compare_file(entry_a, entry_b).await? {
>> +        if let Some(changed) = compare_file(entry_a, entry_b, 
>> compare_contents).await? {
>>               modified_files.insert(path, (entry_b, changed));
>>           }
>>       }
>> @@ -367,6 +391,7 @@ async fn compare_files<'a>(
>>   async fn compare_file(
>>       file_a: &FileEntry,
>>       file_b: &FileEntry,
>> +    compare_contents: bool,
>>   ) -> Result<Option<ChangedProperties>, Error> {
>>       let mut changed = ChangedProperties::default();
>>   @@ -388,6 +413,15 @@ async fn compare_file(
>>               if size_a != size_b {
>>                   changed.size = true;
>>                   changed.content = true;
>> +            } else if compare_contents {
>> +                let content_identical = 
>> compare_file_contents(file_a, file_b).await?;
>> +                if content_identical && !changed.any_without_mtime() {
>> +                    // If the content is identical and nothing, 
>> exluding mtime,
>> +                    // has changed, we don't consider the entry as 
>> modified.
>> +                    changed.mtime = false;
>> +                }
>> +
>> +                changed.content = !content_identical;
>>               };
>>           }
>>           (EntryKind::Directory, EntryKind::Directory) => {}
>> @@ -405,6 +439,44 @@ async fn compare_file(
>>       }
>>   }
>>   +async fn compare_file_contents(file_a: &FileEntry, file_b: 
>> &FileEntry) -> Result<bool, Error> {
>> +    let mut contents_a = file_a.contents().await?;
>> +    let mut contents_b = file_b.contents().await?;
>> +
>> +    compare_readers(&mut contents_a, &mut contents_b).await
>> +}
>> +
>> +async fn compare_readers<T>(reader_a: &mut T, reader_b: &mut T) -> 
>> Result<bool, Error>
>> +where
>> +    T: AsyncReadExt + Unpin,
>> +{
>> +    let mut buf_a = vec![0u8; BUFFERSIZE];
>> +    let mut buf_b = vec![0u8; BUFFERSIZE];
>> +
>> +    loop {
>> +        // normally, a join! would be nice here, but this leads to
>> +        // some 'higher-order lifetime error' for which I'm not 
>> smart enough to solve.
>> +        // let (bytes_read_a, bytes_read_b) =
>> +        //     tokio::try_join!(reader_a.read(&mut buf_a), 
>> reader_b.read(&mut buf_b))?;
>> +        let bytes_read_a = reader_a.read(&mut buf_a).await?;
>> +        let bytes_read_b = reader_b.read(&mut buf_b).await?;
>> +
>> +        if bytes_read_a != bytes_read_b {
>> +            return Ok(false);
>> +        }
>> +
>> +        if bytes_read_a == 0 {
>> +            break;
>> +        }
>> +
>> +        if buf_a[..bytes_read_a] != buf_b[..bytes_read_b] {
>> +            return Ok(false);
>> +        }
>> +    }
>> +
>> +    Ok(true)
>> +}
>> +
>>   #[derive(Copy, Clone, Default)]
>>   struct ChangedProperties {
>>       entry_type: bool,
>> @@ -438,8 +510,11 @@ impl ChangedProperties {
>>       }
>>         fn any(&self) -> bool {
>> +        self.any_without_mtime() || self.mtime
>> +    }
>> +
>> +    fn any_without_mtime(&self) -> bool {
>>           self.entry_type
>> -            || self.mtime
>>               || self.acl
>>               || self.xattrs
>>               || self.fcaps
>> @@ -553,3 +628,57 @@ fn show_file_list(
>>           println!("{op} {entry_type:>2} {mode:>5} {uid:>6} {gid:>6} 
>> {size:>10} {mtime:11} {name}");
>>       }
>>   }
>> +
>> +#[cfg(test)]
>> +mod tests {
>> +    use super::*;
>> +
>> +    use std::{
>> +        io::Cursor,
>> +        pin::Pin,
>> +        task::{Context, Poll},
>> +    };
>> +    use tokio::io::AsyncRead;
>> +
>> +    struct MockedAsyncReader(Cursor<Vec<u8>>);
>> +
>> +    impl AsyncRead for MockedAsyncReader {
>> +        fn poll_read(
>> +            self: Pin<&mut Self>,
>> +            _cx: &mut Context<'_>,
>> +            read_buf: &mut tokio::io::ReadBuf<'_>,
>> +        ) -> std::task::Poll<std::io::Result<()>> {
>
> very minor nit: Poll already gets imported above, so it could be removed
I mean std::task:: of course
>
>> +            let mut buf = vec![0u8; 100];
>> +
>> +            let res = std::io::Read::read(&mut self.get_mut().0, 
>> &mut buf);
>> +
>> +            if let Ok(bytes) = res {
>> +                read_buf.put_slice(&buf[..bytes]);
>> +            }
>> +
>> +            Poll::Ready(res.map(|_| ()))
>> +        }
>> +    }
>> +
>> +    #[test]
>> +    fn test_do_compare_file_contents() {
>> +        fn compare(a: Vec<u8>, b: Vec<u8>) -> Result<bool, Error> {
>> +            let mut mock_a = MockedAsyncReader(Cursor::new(a));
>> +            let mut mock_b = MockedAsyncReader(Cursor::new(b));
>> +
>> + proxmox_async::runtime::block_on(compare_readers(&mut mock_a, &mut 
>> mock_b))
>> +        }
>> +
>> +        assert!(matches!(compare(vec![0; 15], vec![0; 15]), Ok(true)));
>> +        assert!(matches!(compare(vec![0; 15], vec![0; 14]), 
>> Ok(false)));
>> +        assert!(matches!(compare(vec![0; 15], vec![1; 15]), 
>> Ok(false)));
>> +
>> +        let mut buf = vec![1u8; 2 * BUFFERSIZE];
>> +        buf[BUFFERSIZE] = 0;
>> +        assert!(matches!(compare(vec![1u8; 2 * BUFFERSIZE], buf), 
>> Ok(false)));
>> +
>> +        let mut buf = vec![1u8; 2 * BUFFERSIZE];
>> +        buf[2 * BUFFERSIZE - 1] = 0;
>> +        assert!(matches!(compare(vec![1u8; 2 * BUFFERSIZE], buf), 
>> Ok(false)));
>> +    }
>> +}
>
>
> _______________________________________________
> pbs-devel mailing list
> pbs-devel@lists.proxmox.com
> https://lists.proxmox.com/cgi-bin/mailman/listinfo/pbs-devel
>
>




^ permalink raw reply	[flat|nested] 9+ messages in thread

end of thread, other threads:[~2022-12-01 13:31 UTC | newest]

Thread overview: 9+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2022-11-29 14:17 [pbs-devel] [PATCH proxmox-backup 0/2] debug cli: improve output, optionally compare file content for `diff archive` Lukas Wagner
2022-11-29 14:17 ` [pbs-devel] [PATCH proxmox-backup 1/2] debug cli: show more file attributes for `diff archive` command Lukas Wagner
2022-11-29 14:17 ` [pbs-devel] [PATCH proxmox-backup 2/2] debug cli: add 'compare-content' flag to " Lukas Wagner
2022-12-01 13:29   ` Stefan Hanreich
2022-12-01 13:30     ` Stefan Hanreich
2022-11-30 16:27 ` [pbs-devel] [PATCH proxmox-backup 0/2] debug cli: improve output, optionally compare file content for `diff archive` Thomas Lamprecht
2022-12-01  7:30   ` Lukas Wagner
2022-12-01  8:02     ` Thomas Lamprecht
2022-12-01 13:29 ` Stefan Hanreich

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox
Service provided by Proxmox Server Solutions GmbH | Privacy | Legal