public inbox for pbs-devel@lists.proxmox.com
 help / color / mirror / Atom feed
From: Wolfgang Bumiller <w.bumiller@proxmox.com>
To: Filip Schauer <f.schauer@proxmox.com>
Cc: pbs-devel@lists.proxmox.com
Subject: Re: [pbs-devel] [PATCH vma-to-pbs] Initial commit
Date: Tue, 26 Sep 2023 15:11:37 +0200	[thread overview]
Message-ID: <7qha55edsqqk6gy7etu7agyeh5itze2i6ocvbc6vp2n3dstgqt@g4a4m36mq5l7> (raw)
In-Reply-To: <20230922124111.121415-1-f.schauer@proxmox.com>

First things first: please use rustfmt (and clippy, I'm pretty sure some
of the things, eg `pbs_err`'s mutability casts, should be caught by it).

Secondly: this is not a library, so it would be perfectly fine to use
`anyhow` for error handling here.
In fact, the panics/unwraps more than likely cause some errors to come
out without any useful context/info, which probably won't produce useful
output when users run into bugs.
That said, a lot of the unwraps are fine, especially all those
string<->C-string conversions... but those are an entire topic anyway
(see further down).

On Fri, Sep 22, 2023 at 02:41:11PM +0200, Filip Schauer wrote:
> Implement a tool to import VMA files into a Proxmox Backup Server
> 
> Signed-off-by: Filip Schauer <f.schauer@proxmox.com>
> ---
> I could not assign this to one distinct repository, so I propose a new
> one. In its current state this will not be able to build a deb package
> yet since the debian directory is missing.

A separate repository is probably fine for a standalone binary.

> The program calls extern "C" functions from proxmox-backup-qemu to
> interface with the Backup Server. I chose these functions since those
> are the only abstracted functions I could find.

Ideally we'd have a public-facing version of the `pbs-client` crate at
some point. The `proxmox-backup-qemu` code could be used as a base to
design the public facing API for it. In its _current_ state, I'd
personally describe `pbs-client`'s API as indecipherable and not
suitable for a public facing crate.

On the one hand it would definitely be nice to avoid the intermediate C
api layer, but at the same time this would just double the amount of
code that needs updating when pbs-client changes.

So for now I suppose this would be fine, but I would definitely like to
see an actual usable pbs client crate that ends up replacing
`proxmox-backup-qemu`'s use of the `pbs-client` crate as well. This
should be more maintainable.

Optionally, since you're using proxmox-backup-qemu as a
submodule, we could consider exposing some rust-typed functions there as
an initial public API draft.
Eg. instead of using `proxmox_backup_new_ns()`, its building of
`BackupSetup` could be moved into a `BackupSetup::new()` which could be
a `pub fn` to at least skip over the `str->Cstr->str` chain.

> The proxmox-sys dependency is used for reading passwords from the tty.
> 
>  .cargo/config                  |   5 +
>  .gitmodules                    |   6 +
>  Cargo.toml                     |  16 ++
>  Makefile                       |  70 ++++++++
>  src/main.rs                    | 266 +++++++++++++++++++++++++++++
>  src/vma.rs                     | 297 +++++++++++++++++++++++++++++++++
>  submodules/proxmox             |   1 +
>  submodules/proxmox-backup-qemu |   1 +
>  8 files changed, 662 insertions(+)
>  create mode 100644 .cargo/config
>  create mode 100644 .gitmodules
>  create mode 100644 Cargo.toml
>  create mode 100644 Makefile
>  create mode 100644 src/main.rs
>  create mode 100644 src/vma.rs
>  create mode 160000 submodules/proxmox
>  create mode 160000 submodules/proxmox-backup-qemu
> 
> diff --git a/.cargo/config b/.cargo/config
> new file mode 100644
> index 0000000..3b5b6e4
> --- /dev/null
> +++ b/.cargo/config
> @@ -0,0 +1,5 @@
> +[source]
> +[source.debian-packages]
> +directory = "/usr/share/cargo/registry"
> +[source.crates-io]
> +replace-with = "debian-packages"
> diff --git a/.gitmodules b/.gitmodules
> new file mode 100644
> index 0000000..526f5ef
> --- /dev/null
> +++ b/.gitmodules
> @@ -0,0 +1,6 @@
> +[submodule "submodules/proxmox-backup-qemu"]
> +	path = submodules/proxmox-backup-qemu
> +	url = git://git.proxmox.com/git/proxmox-backup-qemu.git
> +[submodule "submodules/proxmox"]
> +	path = submodules/proxmox
> +	url = git://git.proxmox.com/git/proxmox.git

^ We don't need proxmox.git as submodule.

> diff --git a/Cargo.toml b/Cargo.toml
> new file mode 100644
> index 0000000..72e2812
> --- /dev/null
> +++ b/Cargo.toml
> @@ -0,0 +1,16 @@
> +[package]
> +name = "vma-to-pbs"
> +version = "0.0.1"
> +authors = ["Filip Schauer <f.schauer@proxmox.com>"]
> +edition = "2021"
> +
> +[dependencies]

Please sort dependencies, and group them so that the submodule
dependencies are a separate group, and our own packages are another.

> +getopts = "0.2"

I'm really not a friend of something that forces path based arguments to
be `String`s.
It accepts `OsStr` as *input* but only produces `String`s as output.
This crate should never ever be used for serious programs!
That said - the VMA files that exist - coming most likely exclusively
from PVE - won't have non-utf8 paths... Still... I can't take this crate
seriously in its current form.
To my knowledge, `clap` is the only argument parser that supports
sticking with `OsStrings` throughout.

> +proxmox-sys = { path = "submodules/proxmox/proxmox-sys" }
> +proxmox-backup-qemu = { path = "submodules/proxmox-backup-qemu" }
> +proxmox-uuid = "1"
> +md5 = "0.7.0"
> +bincode = "1.3"
> +serde = "1.0"

I'm a bit skeptical here. The serde & bincode combination does not allow
for deserializing into a pre-allocated memory location and the structs
here are very large. You're placing some multiple megabytes of data onto
the stack here.
IMO it would be better to just read into a Box<[u8; N]>, cast it to a
Box<T> and do the byte-swapping in-place.
I'll have to think about that. This seems somewhat convenient, but in
the past, with rust, large structs on the stack have been a bit of a
crashy mine-field.

> +serde-big-array = "0.4.1"
> +array-init = "2.0.1"
> diff --git a/src/main.rs b/src/main.rs
> new file mode 100644
> index 0000000..1e8c240
> --- /dev/null
> +++ b/src/main.rs
> @@ -0,0 +1,266 @@
> +extern crate getopts;
> +extern crate proxmox_sys;
> +extern crate proxmox_backup_qemu;
> +
> +use std::time::{SystemTime, UNIX_EPOCH};
> +use std::ffi::{c_char, CStr, CString};
> +use std::env;
> +use std::ptr;
> +
> +use getopts::Options;
> +use proxmox_sys::linux::tty;
> +use proxmox_backup_qemu::*;
> +
> +mod vma;
> +use vma::*;
> +
> +fn print_usage(program: &str, opts: Options) {

This should have 2 variants: when called via `--help` it should print to
`stdout`, when called due to an error it should print to `stderr`.

> +    let brief = format!("Usage: {} vma_file [options]", program);
> +    print!("{}", opts.usage(&brief));
> +}
> +
> +fn backup_vma_to_pbs(
> +    vma_file_path: String,
> +    pbs_repository: String,
> +    backup_id: String,
> +    pbs_password: String,
> +    keyfile: Option<String>,
> +    key_password: Option<String>,
> +    master_keyfile: Option<String>,
> +    fingerprint: String,
> +    compress: bool,
> +    encrypt: bool
> +) {
> +    println!("VMA input file: {}", vma_file_path);
> +    println!("PBS repository: {}", pbs_repository);
> +    println!("PBS fingerprint: {}", fingerprint);
> +    println!("compress: {}", compress);
> +    println!("encrypt: {}", encrypt);
> +
> +    let backup_time = SystemTime::now().duration_since(UNIX_EPOCH).unwrap().as_secs();
> +    println!("backup time: {}", backup_time);
> +
> +    let pbs_err: *mut c_char = ptr::null_mut();

^ You're actually mutating this, so this should be `let mut pbs_err`.
Otherwise you're basically producing undefined behavior since you cast
it into a mutable value.
All those
    &pbs_err as *const _ as *mut _
expressions can in facto simply be
    &mut pbs_err

> +
> +    let pbs_repository_cstr = CString::new(pbs_repository).unwrap();
> +    let backup_id_cstr = CString::new(backup_id).unwrap();
> +    let pbs_password_cstr = CString::new(pbs_password).unwrap();
> +    let fingerprint_cstr = CString::new(fingerprint).unwrap();
> +    let keyfile_cstr = match keyfile {
> +        Some(x) => Some(CString::new(x).unwrap()),
> +        None => None
> +    };
> +    let keyfile_ptr = match keyfile_cstr {
> +        Some(x) => x.as_ptr(),
> +        None => ptr::null()
> +    };

^ I think all these pairs are more readable this way:

    let keyfile_cstr = keyfile.map(|v| CString::new(v).unwrap());
    let keyfile_ptr = keyfile_cstr.map(|v| v.as_ptr()).unwrap_or(ptr::null());

at least it's more concise.
But not a big deal, combinators also take some getting used to and might
not be the more readable variant for everyone. It's a bit annoying that
we need the cstr binding separately for its lifetime, which is another
thing we could avoid with a more rust-y API for the backup client...

> +    let key_password_cstr = match key_password {
> +        Some(x) => Some(CString::new(x).unwrap()),
> +        None => None
> +    };
> +    let key_password_ptr = match key_password_cstr {
> +        Some(x) => x.as_ptr(),
> +        None => ptr::null()
> +    };
> +    let master_keyfile_cstr = match master_keyfile {
> +        Some(x) => Some(CString::new(x).unwrap()),
> +        None => None
> +    };
> +    let master_keyfile_ptr = match master_keyfile_cstr {
> +        Some(x) => x.as_ptr(),
> +        None => ptr::null()
> +    };
> +
> +    let pbs = proxmox_backup_new_ns(
> +        pbs_repository_cstr.as_ptr(),
> +        ptr::null(),
> +        backup_id_cstr.as_ptr(),
> +        backup_time,
> +        PROXMOX_BACKUP_DEFAULT_CHUNK_SIZE,
> +        pbs_password_cstr.as_ptr(),
> +        keyfile_ptr,
> +        key_password_ptr,
> +        master_keyfile_ptr,
> +        true,
> +        false,
> +        fingerprint_cstr.as_ptr(),
> +        &pbs_err as *const _ as *mut _);

1) &mut pbs_err, no cast

2)
Since you're going to be using `panic` below, you should add a wrapper
for the `*mut ProxmoxBackupHandle` which implements `Drop` accordingly.
(Unless you do start adapting `proxmox-backup-qemu`, in which case this
might just become an `Arc<BackupTask>` which already takes care of
this.)

> +
> +    if pbs == ptr::null_mut() {
> +        unsafe {
> +            let pbs_err_cstr = CStr::from_ptr(pbs_err);
> +            panic!("proxmox_backup_new_ns failed: {}", pbs_err_cstr.to_str().unwrap());

I think it's nicer to just use Debug formatting (`{pbs_err_cstr:?}`).
Avoids typing out an extra unwrap and can be inlined in the format
string.

> +        }
> +    }
> +
> +    let connect_result = proxmox_backup_connect(pbs, &pbs_err as *const _ as *mut _);

^ &mut pbs_err, no cast

> +
> +    if connect_result < 0 {
> +        unsafe {
> +            let pbs_err_cstr = CStr::from_ptr(pbs_err);
> +            panic!("proxmox_backup_connect failed: {}", pbs_err_cstr.to_str().unwrap());
> +        }
> +    }
> +
> +    let mut vma_reader = VmaReader::new(&vma_file_path);
> +
> +    // Handle configs
> +    let configs = vma_reader.get_configs();
> +    for (config_name, config_data) in &configs {

You're not reusing `configs` afterwards, so you can drop the `&`.
That way...

> +        println!("CFG: size: {} name: {}", config_data.len(), config_name);
> +
> +        let config_name_cstr = CString::new(config_name.as_bytes()).unwrap();

...you can also drop the `.as_bytes()` here since CString::new can also
work directly with a `String` since it takes an `impl Into<Vec<u8>>`.

> +
> +        if proxmox_backup_add_config(pbs, config_name_cstr.as_ptr(), config_data.as_ptr(), config_data.len() as u64, &pbs_err as *const _ as *mut _) < 0 {

^ &mut pbs_err, no cast

> +            unsafe {
> +                let pbs_err_cstr = CStr::from_ptr(pbs_err);
> +                panic!("proxmox_backup_add_config failed: {}", pbs_err_cstr.to_str().unwrap());
> +            }
> +        }
> +    }
> +
> +    // Handle block devices
> +    for device_id in 0..255 {
> +        let device_name = match vma_reader.get_device_name(device_id) {
> +            Some(x) => { x }
> +            None => { continue; }
> +        };
> +
> +        let device_size = match vma_reader.get_device_size(device_id) {
> +            Some(x) => { x }
> +            None => { continue; }
> +        };
> +
> +        println!("DEV: dev_id={} size: {} devname: {}", device_id, device_size, device_name);
> +
> +        let device_name_cstr = CString::new(device_name.as_bytes()).unwrap();

^ you can just drop `.as_bytes()` here.

> +        let pbs_device_id = proxmox_backup_register_image(pbs, device_name_cstr.as_ptr(), device_size, false, &pbs_err as *const _ as *mut _);

^ &mut pbs_err, no cast

> +
> +        if pbs_device_id < 0 {
> +            unsafe {
> +                let pbs_err_cstr = CStr::from_ptr(pbs_err);
> +                panic!("proxmox_backup_register_image failed: {}", pbs_err_cstr.to_str().unwrap());
> +            }
> +        }
> +
> +        let mut image_chunk_buffer = [0u8; PROXMOX_BACKUP_DEFAULT_CHUNK_SIZE as usize];

Please let's not put a 4M buffer onto the stack :-)
Also note that with the way rust works, a debug build of
`Box::new([0; size])` will *also* first build that large buffer on the
stack and *afterwards* allocate and move it.

You can add `proxmox-io` as a dependency and use
`proxmox_io::boxed::zeroed(PROXMOX_BACKUP_DEFAULT_CHUNK_SIZE)` here.
This uses `std::alloc` + `std::ptr::write_bytes` for initialization, or,
if nothing else from that crate will be used, just copy its 2 lines into
a helper around here, "malloc()+bzero()" is not enough magic to justify
a full crate dependency for a single function :-)

> +        let mut bytes_transferred = 0;
> +
> +        while bytes_transferred < device_size {
> +            let bytes_left = device_size - bytes_transferred;
> +            let this_chunk_size = if bytes_left < PROXMOX_BACKUP_DEFAULT_CHUNK_SIZE { bytes_left } else { PROXMOX_BACKUP_DEFAULT_CHUNK_SIZE };

^ This is
    let this_chunk_size = bytes_left.min(PROXMOX_BACKUP_DEFAULT_CHUNK_SIZE);

> +            println!("Uploading dev_id: {} offset: {:#0X} - {:#0X}", device_id, bytes_transferred, bytes_transferred + this_chunk_size);
> +
> +            let is_zero_chunk = vma_reader.read_device_contents(device_id, &mut image_chunk_buffer, bytes_transferred).unwrap();

Don't use `unwrap()` on your own functions returning `Results`.
At the very least use `.expect()` to add a bit of context.
But also: anyhow + anyhow::Context trait make more sense.

> +            let write_data_result = proxmox_backup_write_data(
> +                pbs,
> +                pbs_device_id as u8,
> +                if is_zero_chunk { ptr::null() } else { image_chunk_buffer.as_ptr() },
> +                bytes_transferred,
> +                this_chunk_size,
> +                &pbs_err as *const _ as *mut _

^ &mut pbs_err, no cast

> +            );
> +
> +            if write_data_result < 0 {
> +                unsafe {
> +                    let pbs_err_cstr = CStr::from_ptr(pbs_err);
> +                    panic!("proxmox_backup_write_data failed: {}", pbs_err_cstr.to_str().unwrap());
> +                }
> +            }
> +
> +            bytes_transferred += this_chunk_size;
> +        }
> +
> +        if proxmox_backup_close_image(pbs, pbs_device_id as u8, &pbs_err as *const _ as *mut _) < 0 {

^ &mut pbs_err, no cast

> +            unsafe {
> +                let pbs_err_cstr = CStr::from_ptr(pbs_err);
> +                panic!("proxmox_backup_close_image failed: {}", pbs_err_cstr.to_str().unwrap());
> +            }
> +        }
> +    }
> +
> +    if proxmox_backup_finish(pbs, &pbs_err as *const _ as *mut _) < 0 {

^ &mut pbs_err, no cast

> +        unsafe {
> +            let pbs_err_cstr = CStr::from_ptr(pbs_err);
> +            panic!("proxmox_backup_finish failed: {}", pbs_err_cstr.to_str().unwrap());
> +        }
> +    }
> +
> +    proxmox_backup_disconnect(pbs);
> +}
> +
> +fn main() {
> +    let args: Vec<String> = env::args().collect();
> +    let program = args[0].clone();
> +
> +    let mut opts = Options::new();
> +    opts.optopt("", "repository", "Repository URL", "<auth_id>@<host>:<port>:<datastore>");
> +    opts.optopt("", "vmid", "Backup ID", "VMID");
> +    opts.optopt("", "fingerprint", "Proxmox Backup Server Fingerprint", "FINGERPRINT");
> +    opts.optopt("", "keyfile", "Key file", "KEYFILE");
> +    opts.optopt("", "master_keyfile", "Master key file", "MASTER_KEYFILE");
> +    opts.optflag("c", "compress", "Compress the Backup");
> +    opts.optflag("e", "encrypt", "Encrypt the Backup");
> +    opts.optflag("h", "help", "print this help menu");
> +    let matches = match opts.parse(&args[1..]) {
> +        Ok(m) => { m }
> +        Err(f) => { panic!("{}", f.to_string()) }
> +    };
> +
> +    if matches.opt_present("h") {
> +        print_usage(&program, opts);
> +        return;

Side note:
Since both `main` and `print_usage` return `()` you could just
    return print_usage(...);

> +    }
> +
> +    if !matches.opt_present("repository") {
> +        println!("missing --repository option");

^ should be `eprintln`

> +        print_usage(&program, opts);
> +        return;

IMO it would be friendlier to the user if we don't return here just yet,
but just set an `error` boolean and check the remaining required
arguments first, printing all the missing arguments at once, and only
then print the usage & exit.

> +    }
> +    let pbs_repository = matches.opt_str("repository").unwrap();

Also, instead of `opt_present` followed by `opt_str().unwrap()`, better
use only `opt_str` with a `match` statement.

Eg.
    let mut error = false;
    let mut get_required_opt = |name| match matches.opt_str(name) {
        Some(value) => value,
        None => {
            error = true;
            eprintln!("missing --{name} option");
            String::new() // don't care about the value
        }
    };

    let pbs_repository = get_required_opt("repository");
    let vmid = get_required_opt("vmid");
    let fingerprint = get_required_opt("fingerprint");
    if error {
        return print_usage(&program, opts);
    }

> +
> +    if !matches.opt_present("vmid") {
> +        println!("missing --vmid option");
> +        print_usage(&program, opts);
> +        return;
> +    }
> +    let vmid = matches.opt_str("vmid").unwrap();
> +
> +    if !matches.opt_present("fingerprint") {
> +        println!("missing --fingerprint option");
> +        print_usage(&program, opts);
> +        return;
> +    }
> +    let fingerprint = matches.opt_str("fingerprint").unwrap();
> +
> +    let keyfile = matches.opt_str("keyfile");
> +    let master_keyfile = matches.opt_str("master_keyfile");
> +    let compress = matches.opt_present("c");
> +    let encrypt = matches.opt_present("e");
> +
> +    let vma_file_path = if !matches.free.is_empty() {

^ We should probably check the exact len of `.free`, since *multiple*
free arguments are also an error, or at least not handled at all?

> +        matches.free[0].clone()
> +    } else {
> +        print_usage(&program, opts);
> +        return;
> +    };
> +
> +    let pbs_password = String::from_utf8(tty::read_password(&"Password: ").unwrap()).unwrap();
> +    let key_password = match keyfile {
> +        Some(_) => Some(String::from_utf8(tty::read_password(&"Key Password: ").unwrap()).unwrap()),
> +        None => None
> +    };
> +
> +    backup_vma_to_pbs(
> +        vma_file_path,
> +        pbs_repository,
> +        vmid,
> +        pbs_password,
> +        keyfile,
> +        key_password,
> +        master_keyfile,
> +        fingerprint,
> +        compress,
> +        encrypt
> +    );
> +}
> diff --git a/src/vma.rs b/src/vma.rs
> new file mode 100644
> index 0000000..4b239c8
> --- /dev/null
> +++ b/src/vma.rs
> @@ -0,0 +1,297 @@
> +extern crate md5;
> +
> +use std::fs::File;
> +use std::io::{Read, Seek, SeekFrom};
> +use std::mem::size_of;
> +use std::{cmp, error, str, result};

please drop `result` from this list.

> +use std::collections::HashMap;
> +
> +use array_init::array_init;
> +use bincode::*;

^ Please don't do `*` imports.
This is the only reason you need the `std::result` import, but you never
actually use `bincode::Result`, in fact, you actually only use
`bincode::Options`.

> +use serde::{Deserialize, Serialize};
> +use serde_big_array::BigArray;
> +
> +const VMA_BLOCKS_PER_EXTENT: usize = 59;
> +const VMA_MAX_CONFIGS: usize = 256;
> +
> +#[repr(C)]
> +#[derive(Serialize, Deserialize)]
> +struct VmaDeviceInfoHeader {
> +    pub device_name_offset: u32,
> +    reserved: [u8; 4],
> +    pub device_size: u64,
> +    reserved1: [u8; 16]
> +}
> +
> +#[repr(C)]
> +#[derive(Serialize, Deserialize)]
> +struct VmaHeader {
> +    pub magic: [u8; 4],
> +    pub version: u32,
> +    pub uuid: [u8; 16],
> +    pub ctime: u64,
> +    pub md5sum: [u8; 16],
> +    pub blob_buffer_offset: u32,
> +    pub blob_buffer_size: u32,
> +    pub header_size: u32,
> +    #[serde(with = "BigArray")]
> +    reserved: [u8; 1984],
> +    #[serde(with = "BigArray")]
> +    pub config_names: [u32; 256],
> +    #[serde(with = "BigArray")]
> +    pub config_data: [u32; 256],

^ These arrays should use `VMA_MAX_CONFIGS` instead of `256`.

> +    reserved1: [u8; 4],
> +    #[serde(with = "BigArray")]
> +    pub dev_info: [VmaDeviceInfoHeader; 256]

Should probably have a `VMA_MAX_DEVICES` here.

> +}
> +
> +#[repr(C)]
> +#[derive(Serialize, Deserialize)]
> +struct VmaBlockInfo {
> +    pub mask: u16,
> +    reserved: u8,
> +    pub dev_id: u8,
> +    pub cluster_num: u32
> +}
> +
> +#[repr(C)]
> +#[derive(Serialize, Deserialize)]
> +struct VmaExtentHeader {
> +    pub magic: [u8; 4],
> +    reserved: [u8; 2],
> +    pub block_count: u16,
> +    pub uuid: [u8; 16],
> +    pub md5sum: [u8; 16],
> +    #[serde(with = "BigArray")]
> +    pub blockinfo: [VmaBlockInfo; VMA_BLOCKS_PER_EXTENT]
> +}
> +
> +#[derive(Clone)]
> +struct VmaBlockIndexEntry {
> +    pub cluster_file_offset: u64,
> +    pub mask: u16
> +}
> +
> +pub struct VmaReader {
> +    vma_file: File,
> +    vma_header: VmaHeader,
> +    configs: HashMap<String, String>,
> +    block_index: [Vec<VmaBlockIndexEntry>; 256],

IMO this should be either a `Vec<Vec<VmaBlockIndexEntry>>` with checked
`.get()` calls, or at least `Box<>` the array, as `VmaRead` is now quite
a huge thing on the stack.
Alternatively the whole `VmaReader` could be boxed.

> +    blocks_are_indexed: bool
> +}
> +
> +impl VmaReader {
> +    pub fn new(vma_file_path: &str) -> Self {
> +        let mut vma_file = match File::open(vma_file_path) {
> +            Err(why) => panic!("couldn't open {}: {}", vma_file_path, why),
> +            Ok(file) => file,
> +        };
> +
> +        let vma_header = Self::read_header(&mut vma_file).unwrap();
> +        let configs = Self::read_blob_buffer(&mut vma_file, &vma_header).unwrap();
> +        let block_index: [Vec<VmaBlockIndexEntry>; 256] = array_init(|_| Vec::new());
> +
> +        let instance = Self {
> +            vma_file,
> +            vma_header,
> +            configs,
> +            block_index,
> +            blocks_are_indexed: false
> +        };
> +
> +        return instance;
> +    }
> +
> +    fn read_header(vma_file: &mut File) -> result::Result<VmaHeader, Box<dyn error::Error>> {
> +        let mut buffer = vec![0u8; size_of::<VmaHeader>()];

Better to `Vec::with_capacity()` and `.resize(size, 0)`. The above code
will create a temporary `VmaHeader`-sized zero-buffer on the stack
before moving it. In debug builds this can even be notice-ably slow. At
least in older rust versions building large [0u8; N] could take a
visible amount of time (depending on N even seconds at times...)

> +        vma_file.read_exact(&mut buffer)?;
> +
> +        let bincode_options = bincode::DefaultOptions::new()
> +            .with_fixint_encoding()
> +            .with_big_endian();
> +
> +        let vma_header: VmaHeader = bincode_options.deserialize(&buffer)?;
> +
> +        if vma_header.magic != [b'V', b'M', b'A', 0] {
> +            return Err("Invalid magic number".into());
> +        }
> +
> +        if vma_header.version != 1 {
> +            return Err(format!("Invalid VMA version {}", vma_header.version).into());
> +        }
> +
> +        buffer.resize(vma_header.header_size as usize, 0);
> +        vma_file.read_exact(&mut buffer[size_of::<VmaHeader>()..])?;
> +
> +        buffer[32..48].fill(0);

Might be worth mentioning that this is the md5 sum in the struct. In
fact, a helper casting a buffer to `&mut VmaHeader` and using the actual
member access to zero it out might be nice, but not necessary since
these are never going to "move". So feel free to just add a comment
instead, but a comment is definitely needed ;-)

> +        let computed_md5sum: [u8; 16] = md5::compute(&buffer).into();
> +
> +        if vma_header.md5sum != computed_md5sum {
> +            return Err("Wrong VMA header checksum".into());
> +        }
> +
> +        return Ok(vma_header);
> +    }
> +
> +    fn read_string_from_file(vma_file: &mut File, file_offset: u64) -> result::Result<String, Box<dyn error::Error>> {
> +        let mut size_bytes = [0u8; 2];
> +        vma_file.seek(SeekFrom::Start(file_offset))?;
> +        vma_file.read_exact(&mut size_bytes)?;
> +        let size = u16::from_le_bytes(size_bytes) as usize;
> +        let mut string_bytes = vec![0u8; size - 1];
> +        vma_file.read_exact(&mut string_bytes)?;
> +        let string = str::from_utf8(&string_bytes)?;
> +
> +        return Ok(string.to_string());
> +    }
> +
> +    fn read_blob_buffer(vma_file: &mut File, vma_header: &VmaHeader) -> result::Result<HashMap<String, String>, Box<dyn error::Error>> {
> +        let mut configs = HashMap::new();
> +
> +        for i in 0..VMA_MAX_CONFIGS {
> +            let config_name_offset = vma_header.config_names[i];
> +            let config_data_offset = vma_header.config_data[i];
> +
> +            if config_name_offset == 0 || config_data_offset == 0 {
> +                continue;
> +            }
> +
> +            let config_name_file_offset = (vma_header.blob_buffer_offset + config_name_offset) as u64;
> +            let config_data_file_offset = (vma_header.blob_buffer_offset + config_data_offset) as u64;
> +            let config_name = Self::read_string_from_file(vma_file, config_name_file_offset)?;
> +            let config_data = Self::read_string_from_file(vma_file, config_data_file_offset)?;
> +
> +            configs.insert(String::from(config_name), String::from(config_data));
> +        }
> +
> +        return Ok(configs);
> +    }
> +
> +    pub fn get_configs(&self) -> HashMap<String, String> {
> +        return self.configs.clone();
> +    }
> +
> +    pub fn get_device_name(&mut self, device_id: u8) -> Option<String> {

^ API-wise I don't think it makes much sense to encode the 256 limit
into the type here. You can get rid of a *lot* of `as usize` casts if
this is just an usize itself. But of course, you'd need to validate the
value then, but at least it can then be explicitly tied to a named MAX_*
constant to explain the limit.

> +        let device_name_offset = self.vma_header.dev_info[device_id as usize].device_name_offset;
> +
> +        if device_name_offset == 0 {
> +            return None;
> +        }
> +
> +        let device_name_file_offset = (self.vma_header.blob_buffer_offset + device_name_offset) as u64;
> +        let device_name = Self::read_string_from_file(&mut self.vma_file, device_name_file_offset).unwrap();
> +
> +        return Some(device_name.to_string());
> +    }
> +
> +    pub fn get_device_size(&self, device_id: u8) -> Option<u64> {

Much less typing if you first

    let dev_info = &self.vma_header.dev_info[device_id as usize];

and not duplicate the `self.vma_header.dev_info[device_id as usize]`
bit...

> +        if self.vma_header.dev_info[device_id as usize].device_name_offset == 0 {
> +            return None;
> +        }
> +
> +        return Some(self.vma_header.dev_info[device_id as usize].device_size);
> +    }
> +
> +    fn read_extent_header(vma_file: &mut File) -> result::Result<VmaExtentHeader, Box<dyn error::Error>> {
> +        let mut buffer = vec![0u8; size_of::<VmaExtentHeader>()];

Again, please manage the buffer as mentioned above in `read_header`.

> +        vma_file.read_exact(&mut buffer)?;
> +
> +        let bincode_options = bincode::DefaultOptions::new()
> +            .with_fixint_encoding()
> +            .with_big_endian();
> +
> +        let vma_extent_header: VmaExtentHeader = bincode_options.deserialize(&buffer)?;
> +
> +        if vma_extent_header.magic != [b'V', b'M', b'A', b'E'] {
> +            return Err("Invalid magic number".into());
> +        }
> +
> +        buffer[24..40].fill(0);

^ Same here, comment needed

> +        let computed_md5sum: [u8; 16] = md5::compute(&buffer).into();
> +
> +        if vma_extent_header.md5sum != computed_md5sum {
> +            return Err("Wrong VMA extent header checksum".into());
> +        }
> +
> +        return Ok(vma_extent_header);
> +    }
> +
> +    fn index_device_clusters(&mut self) -> result::Result<(), Box<dyn error::Error>> {
> +        for device_id in 0..255 {
> +            let device_size = match self.get_device_size(device_id) {
> +                Some(x) => { x }
> +                None => { continue; }
> +            };
> +
> +            let device_cluster_count = (device_size + 4096 * 16 - 1) / (4096 * 16);
> +
> +            let block_index_entry_placeholder = VmaBlockIndexEntry {
> +                cluster_file_offset: 0,
> +                mask: 0
> +            };
> +
> +            self.block_index[device_id as usize].resize(device_cluster_count as usize, block_index_entry_placeholder);
> +        }
> +
> +        let mut file_offset = self.vma_header.header_size as u64;
> +        let vma_file_size = self.vma_file.metadata()?.len();
> +
> +        while file_offset < vma_file_size {
> +            self.vma_file.seek(SeekFrom::Start(file_offset))?;
> +            let vma_extent_header = Self::read_extent_header(&mut self.vma_file)?;
> +            file_offset += 512;

^ Should be able to use `size_of::<VmaExtentHeader>()` here, no?

> +
> +            for i in 0..59 as usize {

^ Should use VMA_BLOCKS_PER_EXTENT instead of 59.

> +                let blockinfo = &vma_extent_header.blockinfo[i];
> +
> +                if blockinfo.dev_id == 0 {
> +                    continue;
> +                }
> +
> +                let block_index_entry = VmaBlockIndexEntry {
> +                    cluster_file_offset: file_offset,
> +                    mask: blockinfo.mask
> +                };
> +
> +                self.block_index[blockinfo.dev_id as usize][blockinfo.cluster_num as usize] = block_index_entry;
> +                file_offset += blockinfo.mask.count_ones() as u64 * 4096;
> +            }
> +        }
> +
> +        return Ok(());
> +    }
> +
> +    pub fn read_device_contents(&mut self, device_id: u8, buffer: &mut [u8], offset: u64) -> result::Result<bool, Box<dyn error::Error>> {
> +        assert_eq!(offset % (4096 * 16), 0);

And another reason for anyhow over `panic!` based error handling: I'm
not sure the above is supposed to be checked on release builds as well?

> +
> +        // Make sure that the device clusters are already indexed
> +        if !self.blocks_are_indexed {
> +            self.index_device_clusters()?;
> +        }
> +
> +        let this_device_block_index = &self.block_index[device_id as usize];
> +        let length = cmp::min(buffer.len(), this_device_block_index.len() * 4096 * 16 - offset as usize);
> +        let mut buffer_offset = 0;
> +        let mut buffer_is_zero = true;
> +
> +        while buffer_offset < length {
> +            let block_index_entry = &this_device_block_index[(offset as usize + buffer_offset) / (4096 * 16)];
> +            self.vma_file.seek(SeekFrom::Start(block_index_entry.cluster_file_offset))?;
> +
> +            for i in 0..16 {
> +                let current_block_mask = ((block_index_entry.mask >> i) & 1) == 1;
> +
> +                if current_block_mask {
> +                    self.vma_file.read_exact(&mut buffer[buffer_offset..(buffer_offset + 4096)])?;

it's possible to have a device that isn't exactly a multiple of
4096, meaning its very final block may be smaller, so the device size
needs to always be taken into account when reading such blocks.

You can produce such an image via `pvesm alloc` as it takes a multiple
of `1024` for its size, eg:

    # pvesm alloc local 100 vm-100-bad.raw $[(1<<20) + 1]

Will produce a disk that is 1G + 1024 bytes.

Now check what your tool does with the file
- while the final blocks are still zero
- then fill those blocks with data and check again.

> +                    buffer_is_zero = false;
> +                } else {
> +                    buffer[buffer_offset..(buffer_offset + 4096)].fill(0);
> +                }
> +
> +                buffer_offset += 4096;
> +            }
> +        }
> +
> +        return Ok(buffer_is_zero);
> +    }
> +}




  reply	other threads:[~2023-09-26 13:12 UTC|newest]

Thread overview: 6+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2023-09-22 12:41 Filip Schauer
2023-09-26 13:11 ` Wolfgang Bumiller [this message]
2023-09-29 13:22   ` Filip Schauer
2023-09-29 13:38     ` Lukas Wagner
2023-09-29 13:12 Filip Schauer
2023-09-29 13:15 ` Filip Schauer

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=7qha55edsqqk6gy7etu7agyeh5itze2i6ocvbc6vp2n3dstgqt@g4a4m36mq5l7 \
    --to=w.bumiller@proxmox.com \
    --cc=f.schauer@proxmox.com \
    --cc=pbs-devel@lists.proxmox.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox
Service provided by Proxmox Server Solutions GmbH | Privacy | Legal