From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from firstgate.proxmox.com (firstgate.proxmox.com [212.224.123.68]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature RSA-PSS (2048 bits)) (No client certificate requested) by lists.proxmox.com (Postfix) with ESMTPS id 1C867E830 for ; Tue, 26 Sep 2023 15:12:14 +0200 (CEST) Received: from firstgate.proxmox.com (localhost [127.0.0.1]) by firstgate.proxmox.com (Proxmox) with ESMTP id EBD001ADC for ; Tue, 26 Sep 2023 15:11:43 +0200 (CEST) Received: from proxmox-new.maurer-it.com (proxmox-new.maurer-it.com [94.136.29.106]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature RSA-PSS (2048 bits) server-digest SHA256) (No client certificate requested) by firstgate.proxmox.com (Proxmox) with ESMTPS for ; Tue, 26 Sep 2023 15:11:41 +0200 (CEST) Received: from proxmox-new.maurer-it.com (localhost.localdomain [127.0.0.1]) by proxmox-new.maurer-it.com (Proxmox) with ESMTP id 6BA1348D54 for ; Tue, 26 Sep 2023 15:11:41 +0200 (CEST) Date: Tue, 26 Sep 2023 15:11:37 +0200 From: Wolfgang Bumiller To: Filip Schauer Cc: pbs-devel@lists.proxmox.com Message-ID: <7qha55edsqqk6gy7etu7agyeh5itze2i6ocvbc6vp2n3dstgqt@g4a4m36mq5l7> References: <20230922124111.121415-1-f.schauer@proxmox.com> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <20230922124111.121415-1-f.schauer@proxmox.com> X-SPAM-LEVEL: Spam detection results: 0 AWL 0.100 Adjusted score from AWL reputation of From: address BAYES_00 -1.9 Bayes spam probability is 0 to 1% DMARC_MISSING 0.1 Missing DMARC policy KAM_DMARC_STATUS 0.01 Test Rule for DKIM or SPF Failure with Strict Alignment SPF_HELO_NONE 0.001 SPF: HELO does not publish an SPF Record SPF_PASS -0.001 SPF: sender matches SPF record Subject: Re: [pbs-devel] [PATCH vma-to-pbs] Initial commit X-BeenThere: pbs-devel@lists.proxmox.com X-Mailman-Version: 2.1.29 Precedence: list List-Id: Proxmox Backup Server development discussion List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Tue, 26 Sep 2023 13:12:14 -0000 First things first: please use rustfmt (and clippy, I'm pretty sure some of the things, eg `pbs_err`'s mutability casts, should be caught by it). Secondly: this is not a library, so it would be perfectly fine to use `anyhow` for error handling here. In fact, the panics/unwraps more than likely cause some errors to come out without any useful context/info, which probably won't produce useful output when users run into bugs. That said, a lot of the unwraps are fine, especially all those string<->C-string conversions... but those are an entire topic anyway (see further down). On Fri, Sep 22, 2023 at 02:41:11PM +0200, Filip Schauer wrote: > Implement a tool to import VMA files into a Proxmox Backup Server > > Signed-off-by: Filip Schauer > --- > I could not assign this to one distinct repository, so I propose a new > one. In its current state this will not be able to build a deb package > yet since the debian directory is missing. A separate repository is probably fine for a standalone binary. > The program calls extern "C" functions from proxmox-backup-qemu to > interface with the Backup Server. I chose these functions since those > are the only abstracted functions I could find. Ideally we'd have a public-facing version of the `pbs-client` crate at some point. The `proxmox-backup-qemu` code could be used as a base to design the public facing API for it. In its _current_ state, I'd personally describe `pbs-client`'s API as indecipherable and not suitable for a public facing crate. On the one hand it would definitely be nice to avoid the intermediate C api layer, but at the same time this would just double the amount of code that needs updating when pbs-client changes. So for now I suppose this would be fine, but I would definitely like to see an actual usable pbs client crate that ends up replacing `proxmox-backup-qemu`'s use of the `pbs-client` crate as well. This should be more maintainable. Optionally, since you're using proxmox-backup-qemu as a submodule, we could consider exposing some rust-typed functions there as an initial public API draft. Eg. instead of using `proxmox_backup_new_ns()`, its building of `BackupSetup` could be moved into a `BackupSetup::new()` which could be a `pub fn` to at least skip over the `str->Cstr->str` chain. > The proxmox-sys dependency is used for reading passwords from the tty. > > .cargo/config | 5 + > .gitmodules | 6 + > Cargo.toml | 16 ++ > Makefile | 70 ++++++++ > src/main.rs | 266 +++++++++++++++++++++++++++++ > src/vma.rs | 297 +++++++++++++++++++++++++++++++++ > submodules/proxmox | 1 + > submodules/proxmox-backup-qemu | 1 + > 8 files changed, 662 insertions(+) > create mode 100644 .cargo/config > create mode 100644 .gitmodules > create mode 100644 Cargo.toml > create mode 100644 Makefile > create mode 100644 src/main.rs > create mode 100644 src/vma.rs > create mode 160000 submodules/proxmox > create mode 160000 submodules/proxmox-backup-qemu > > diff --git a/.cargo/config b/.cargo/config > new file mode 100644 > index 0000000..3b5b6e4 > --- /dev/null > +++ b/.cargo/config > @@ -0,0 +1,5 @@ > +[source] > +[source.debian-packages] > +directory = "/usr/share/cargo/registry" > +[source.crates-io] > +replace-with = "debian-packages" > diff --git a/.gitmodules b/.gitmodules > new file mode 100644 > index 0000000..526f5ef > --- /dev/null > +++ b/.gitmodules > @@ -0,0 +1,6 @@ > +[submodule "submodules/proxmox-backup-qemu"] > + path = submodules/proxmox-backup-qemu > + url = git://git.proxmox.com/git/proxmox-backup-qemu.git > +[submodule "submodules/proxmox"] > + path = submodules/proxmox > + url = git://git.proxmox.com/git/proxmox.git ^ We don't need proxmox.git as submodule. > diff --git a/Cargo.toml b/Cargo.toml > new file mode 100644 > index 0000000..72e2812 > --- /dev/null > +++ b/Cargo.toml > @@ -0,0 +1,16 @@ > +[package] > +name = "vma-to-pbs" > +version = "0.0.1" > +authors = ["Filip Schauer "] > +edition = "2021" > + > +[dependencies] Please sort dependencies, and group them so that the submodule dependencies are a separate group, and our own packages are another. > +getopts = "0.2" I'm really not a friend of something that forces path based arguments to be `String`s. It accepts `OsStr` as *input* but only produces `String`s as output. This crate should never ever be used for serious programs! That said - the VMA files that exist - coming most likely exclusively from PVE - won't have non-utf8 paths... Still... I can't take this crate seriously in its current form. To my knowledge, `clap` is the only argument parser that supports sticking with `OsStrings` throughout. > +proxmox-sys = { path = "submodules/proxmox/proxmox-sys" } > +proxmox-backup-qemu = { path = "submodules/proxmox-backup-qemu" } > +proxmox-uuid = "1" > +md5 = "0.7.0" > +bincode = "1.3" > +serde = "1.0" I'm a bit skeptical here. The serde & bincode combination does not allow for deserializing into a pre-allocated memory location and the structs here are very large. You're placing some multiple megabytes of data onto the stack here. IMO it would be better to just read into a Box<[u8; N]>, cast it to a Box and do the byte-swapping in-place. I'll have to think about that. This seems somewhat convenient, but in the past, with rust, large structs on the stack have been a bit of a crashy mine-field. > +serde-big-array = "0.4.1" > +array-init = "2.0.1" > diff --git a/src/main.rs b/src/main.rs > new file mode 100644 > index 0000000..1e8c240 > --- /dev/null > +++ b/src/main.rs > @@ -0,0 +1,266 @@ > +extern crate getopts; > +extern crate proxmox_sys; > +extern crate proxmox_backup_qemu; > + > +use std::time::{SystemTime, UNIX_EPOCH}; > +use std::ffi::{c_char, CStr, CString}; > +use std::env; > +use std::ptr; > + > +use getopts::Options; > +use proxmox_sys::linux::tty; > +use proxmox_backup_qemu::*; > + > +mod vma; > +use vma::*; > + > +fn print_usage(program: &str, opts: Options) { This should have 2 variants: when called via `--help` it should print to `stdout`, when called due to an error it should print to `stderr`. > + let brief = format!("Usage: {} vma_file [options]", program); > + print!("{}", opts.usage(&brief)); > +} > + > +fn backup_vma_to_pbs( > + vma_file_path: String, > + pbs_repository: String, > + backup_id: String, > + pbs_password: String, > + keyfile: Option, > + key_password: Option, > + master_keyfile: Option, > + fingerprint: String, > + compress: bool, > + encrypt: bool > +) { > + println!("VMA input file: {}", vma_file_path); > + println!("PBS repository: {}", pbs_repository); > + println!("PBS fingerprint: {}", fingerprint); > + println!("compress: {}", compress); > + println!("encrypt: {}", encrypt); > + > + let backup_time = SystemTime::now().duration_since(UNIX_EPOCH).unwrap().as_secs(); > + println!("backup time: {}", backup_time); > + > + let pbs_err: *mut c_char = ptr::null_mut(); ^ You're actually mutating this, so this should be `let mut pbs_err`. Otherwise you're basically producing undefined behavior since you cast it into a mutable value. All those &pbs_err as *const _ as *mut _ expressions can in facto simply be &mut pbs_err > + > + let pbs_repository_cstr = CString::new(pbs_repository).unwrap(); > + let backup_id_cstr = CString::new(backup_id).unwrap(); > + let pbs_password_cstr = CString::new(pbs_password).unwrap(); > + let fingerprint_cstr = CString::new(fingerprint).unwrap(); > + let keyfile_cstr = match keyfile { > + Some(x) => Some(CString::new(x).unwrap()), > + None => None > + }; > + let keyfile_ptr = match keyfile_cstr { > + Some(x) => x.as_ptr(), > + None => ptr::null() > + }; ^ I think all these pairs are more readable this way: let keyfile_cstr = keyfile.map(|v| CString::new(v).unwrap()); let keyfile_ptr = keyfile_cstr.map(|v| v.as_ptr()).unwrap_or(ptr::null()); at least it's more concise. But not a big deal, combinators also take some getting used to and might not be the more readable variant for everyone. It's a bit annoying that we need the cstr binding separately for its lifetime, which is another thing we could avoid with a more rust-y API for the backup client... > + let key_password_cstr = match key_password { > + Some(x) => Some(CString::new(x).unwrap()), > + None => None > + }; > + let key_password_ptr = match key_password_cstr { > + Some(x) => x.as_ptr(), > + None => ptr::null() > + }; > + let master_keyfile_cstr = match master_keyfile { > + Some(x) => Some(CString::new(x).unwrap()), > + None => None > + }; > + let master_keyfile_ptr = match master_keyfile_cstr { > + Some(x) => x.as_ptr(), > + None => ptr::null() > + }; > + > + let pbs = proxmox_backup_new_ns( > + pbs_repository_cstr.as_ptr(), > + ptr::null(), > + backup_id_cstr.as_ptr(), > + backup_time, > + PROXMOX_BACKUP_DEFAULT_CHUNK_SIZE, > + pbs_password_cstr.as_ptr(), > + keyfile_ptr, > + key_password_ptr, > + master_keyfile_ptr, > + true, > + false, > + fingerprint_cstr.as_ptr(), > + &pbs_err as *const _ as *mut _); 1) &mut pbs_err, no cast 2) Since you're going to be using `panic` below, you should add a wrapper for the `*mut ProxmoxBackupHandle` which implements `Drop` accordingly. (Unless you do start adapting `proxmox-backup-qemu`, in which case this might just become an `Arc` which already takes care of this.) > + > + if pbs == ptr::null_mut() { > + unsafe { > + let pbs_err_cstr = CStr::from_ptr(pbs_err); > + panic!("proxmox_backup_new_ns failed: {}", pbs_err_cstr.to_str().unwrap()); I think it's nicer to just use Debug formatting (`{pbs_err_cstr:?}`). Avoids typing out an extra unwrap and can be inlined in the format string. > + } > + } > + > + let connect_result = proxmox_backup_connect(pbs, &pbs_err as *const _ as *mut _); ^ &mut pbs_err, no cast > + > + if connect_result < 0 { > + unsafe { > + let pbs_err_cstr = CStr::from_ptr(pbs_err); > + panic!("proxmox_backup_connect failed: {}", pbs_err_cstr.to_str().unwrap()); > + } > + } > + > + let mut vma_reader = VmaReader::new(&vma_file_path); > + > + // Handle configs > + let configs = vma_reader.get_configs(); > + for (config_name, config_data) in &configs { You're not reusing `configs` afterwards, so you can drop the `&`. That way... > + println!("CFG: size: {} name: {}", config_data.len(), config_name); > + > + let config_name_cstr = CString::new(config_name.as_bytes()).unwrap(); ...you can also drop the `.as_bytes()` here since CString::new can also work directly with a `String` since it takes an `impl Into>`. > + > + if proxmox_backup_add_config(pbs, config_name_cstr.as_ptr(), config_data.as_ptr(), config_data.len() as u64, &pbs_err as *const _ as *mut _) < 0 { ^ &mut pbs_err, no cast > + unsafe { > + let pbs_err_cstr = CStr::from_ptr(pbs_err); > + panic!("proxmox_backup_add_config failed: {}", pbs_err_cstr.to_str().unwrap()); > + } > + } > + } > + > + // Handle block devices > + for device_id in 0..255 { > + let device_name = match vma_reader.get_device_name(device_id) { > + Some(x) => { x } > + None => { continue; } > + }; > + > + let device_size = match vma_reader.get_device_size(device_id) { > + Some(x) => { x } > + None => { continue; } > + }; > + > + println!("DEV: dev_id={} size: {} devname: {}", device_id, device_size, device_name); > + > + let device_name_cstr = CString::new(device_name.as_bytes()).unwrap(); ^ you can just drop `.as_bytes()` here. > + let pbs_device_id = proxmox_backup_register_image(pbs, device_name_cstr.as_ptr(), device_size, false, &pbs_err as *const _ as *mut _); ^ &mut pbs_err, no cast > + > + if pbs_device_id < 0 { > + unsafe { > + let pbs_err_cstr = CStr::from_ptr(pbs_err); > + panic!("proxmox_backup_register_image failed: {}", pbs_err_cstr.to_str().unwrap()); > + } > + } > + > + let mut image_chunk_buffer = [0u8; PROXMOX_BACKUP_DEFAULT_CHUNK_SIZE as usize]; Please let's not put a 4M buffer onto the stack :-) Also note that with the way rust works, a debug build of `Box::new([0; size])` will *also* first build that large buffer on the stack and *afterwards* allocate and move it. You can add `proxmox-io` as a dependency and use `proxmox_io::boxed::zeroed(PROXMOX_BACKUP_DEFAULT_CHUNK_SIZE)` here. This uses `std::alloc` + `std::ptr::write_bytes` for initialization, or, if nothing else from that crate will be used, just copy its 2 lines into a helper around here, "malloc()+bzero()" is not enough magic to justify a full crate dependency for a single function :-) > + let mut bytes_transferred = 0; > + > + while bytes_transferred < device_size { > + let bytes_left = device_size - bytes_transferred; > + let this_chunk_size = if bytes_left < PROXMOX_BACKUP_DEFAULT_CHUNK_SIZE { bytes_left } else { PROXMOX_BACKUP_DEFAULT_CHUNK_SIZE }; ^ This is let this_chunk_size = bytes_left.min(PROXMOX_BACKUP_DEFAULT_CHUNK_SIZE); > + println!("Uploading dev_id: {} offset: {:#0X} - {:#0X}", device_id, bytes_transferred, bytes_transferred + this_chunk_size); > + > + let is_zero_chunk = vma_reader.read_device_contents(device_id, &mut image_chunk_buffer, bytes_transferred).unwrap(); Don't use `unwrap()` on your own functions returning `Results`. At the very least use `.expect()` to add a bit of context. But also: anyhow + anyhow::Context trait make more sense. > + let write_data_result = proxmox_backup_write_data( > + pbs, > + pbs_device_id as u8, > + if is_zero_chunk { ptr::null() } else { image_chunk_buffer.as_ptr() }, > + bytes_transferred, > + this_chunk_size, > + &pbs_err as *const _ as *mut _ ^ &mut pbs_err, no cast > + ); > + > + if write_data_result < 0 { > + unsafe { > + let pbs_err_cstr = CStr::from_ptr(pbs_err); > + panic!("proxmox_backup_write_data failed: {}", pbs_err_cstr.to_str().unwrap()); > + } > + } > + > + bytes_transferred += this_chunk_size; > + } > + > + if proxmox_backup_close_image(pbs, pbs_device_id as u8, &pbs_err as *const _ as *mut _) < 0 { ^ &mut pbs_err, no cast > + unsafe { > + let pbs_err_cstr = CStr::from_ptr(pbs_err); > + panic!("proxmox_backup_close_image failed: {}", pbs_err_cstr.to_str().unwrap()); > + } > + } > + } > + > + if proxmox_backup_finish(pbs, &pbs_err as *const _ as *mut _) < 0 { ^ &mut pbs_err, no cast > + unsafe { > + let pbs_err_cstr = CStr::from_ptr(pbs_err); > + panic!("proxmox_backup_finish failed: {}", pbs_err_cstr.to_str().unwrap()); > + } > + } > + > + proxmox_backup_disconnect(pbs); > +} > + > +fn main() { > + let args: Vec = env::args().collect(); > + let program = args[0].clone(); > + > + let mut opts = Options::new(); > + opts.optopt("", "repository", "Repository URL", "@::"); > + opts.optopt("", "vmid", "Backup ID", "VMID"); > + opts.optopt("", "fingerprint", "Proxmox Backup Server Fingerprint", "FINGERPRINT"); > + opts.optopt("", "keyfile", "Key file", "KEYFILE"); > + opts.optopt("", "master_keyfile", "Master key file", "MASTER_KEYFILE"); > + opts.optflag("c", "compress", "Compress the Backup"); > + opts.optflag("e", "encrypt", "Encrypt the Backup"); > + opts.optflag("h", "help", "print this help menu"); > + let matches = match opts.parse(&args[1..]) { > + Ok(m) => { m } > + Err(f) => { panic!("{}", f.to_string()) } > + }; > + > + if matches.opt_present("h") { > + print_usage(&program, opts); > + return; Side note: Since both `main` and `print_usage` return `()` you could just return print_usage(...); > + } > + > + if !matches.opt_present("repository") { > + println!("missing --repository option"); ^ should be `eprintln` > + print_usage(&program, opts); > + return; IMO it would be friendlier to the user if we don't return here just yet, but just set an `error` boolean and check the remaining required arguments first, printing all the missing arguments at once, and only then print the usage & exit. > + } > + let pbs_repository = matches.opt_str("repository").unwrap(); Also, instead of `opt_present` followed by `opt_str().unwrap()`, better use only `opt_str` with a `match` statement. Eg. let mut error = false; let mut get_required_opt = |name| match matches.opt_str(name) { Some(value) => value, None => { error = true; eprintln!("missing --{name} option"); String::new() // don't care about the value } }; let pbs_repository = get_required_opt("repository"); let vmid = get_required_opt("vmid"); let fingerprint = get_required_opt("fingerprint"); if error { return print_usage(&program, opts); } > + > + if !matches.opt_present("vmid") { > + println!("missing --vmid option"); > + print_usage(&program, opts); > + return; > + } > + let vmid = matches.opt_str("vmid").unwrap(); > + > + if !matches.opt_present("fingerprint") { > + println!("missing --fingerprint option"); > + print_usage(&program, opts); > + return; > + } > + let fingerprint = matches.opt_str("fingerprint").unwrap(); > + > + let keyfile = matches.opt_str("keyfile"); > + let master_keyfile = matches.opt_str("master_keyfile"); > + let compress = matches.opt_present("c"); > + let encrypt = matches.opt_present("e"); > + > + let vma_file_path = if !matches.free.is_empty() { ^ We should probably check the exact len of `.free`, since *multiple* free arguments are also an error, or at least not handled at all? > + matches.free[0].clone() > + } else { > + print_usage(&program, opts); > + return; > + }; > + > + let pbs_password = String::from_utf8(tty::read_password(&"Password: ").unwrap()).unwrap(); > + let key_password = match keyfile { > + Some(_) => Some(String::from_utf8(tty::read_password(&"Key Password: ").unwrap()).unwrap()), > + None => None > + }; > + > + backup_vma_to_pbs( > + vma_file_path, > + pbs_repository, > + vmid, > + pbs_password, > + keyfile, > + key_password, > + master_keyfile, > + fingerprint, > + compress, > + encrypt > + ); > +} > diff --git a/src/vma.rs b/src/vma.rs > new file mode 100644 > index 0000000..4b239c8 > --- /dev/null > +++ b/src/vma.rs > @@ -0,0 +1,297 @@ > +extern crate md5; > + > +use std::fs::File; > +use std::io::{Read, Seek, SeekFrom}; > +use std::mem::size_of; > +use std::{cmp, error, str, result}; please drop `result` from this list. > +use std::collections::HashMap; > + > +use array_init::array_init; > +use bincode::*; ^ Please don't do `*` imports. This is the only reason you need the `std::result` import, but you never actually use `bincode::Result`, in fact, you actually only use `bincode::Options`. > +use serde::{Deserialize, Serialize}; > +use serde_big_array::BigArray; > + > +const VMA_BLOCKS_PER_EXTENT: usize = 59; > +const VMA_MAX_CONFIGS: usize = 256; > + > +#[repr(C)] > +#[derive(Serialize, Deserialize)] > +struct VmaDeviceInfoHeader { > + pub device_name_offset: u32, > + reserved: [u8; 4], > + pub device_size: u64, > + reserved1: [u8; 16] > +} > + > +#[repr(C)] > +#[derive(Serialize, Deserialize)] > +struct VmaHeader { > + pub magic: [u8; 4], > + pub version: u32, > + pub uuid: [u8; 16], > + pub ctime: u64, > + pub md5sum: [u8; 16], > + pub blob_buffer_offset: u32, > + pub blob_buffer_size: u32, > + pub header_size: u32, > + #[serde(with = "BigArray")] > + reserved: [u8; 1984], > + #[serde(with = "BigArray")] > + pub config_names: [u32; 256], > + #[serde(with = "BigArray")] > + pub config_data: [u32; 256], ^ These arrays should use `VMA_MAX_CONFIGS` instead of `256`. > + reserved1: [u8; 4], > + #[serde(with = "BigArray")] > + pub dev_info: [VmaDeviceInfoHeader; 256] Should probably have a `VMA_MAX_DEVICES` here. > +} > + > +#[repr(C)] > +#[derive(Serialize, Deserialize)] > +struct VmaBlockInfo { > + pub mask: u16, > + reserved: u8, > + pub dev_id: u8, > + pub cluster_num: u32 > +} > + > +#[repr(C)] > +#[derive(Serialize, Deserialize)] > +struct VmaExtentHeader { > + pub magic: [u8; 4], > + reserved: [u8; 2], > + pub block_count: u16, > + pub uuid: [u8; 16], > + pub md5sum: [u8; 16], > + #[serde(with = "BigArray")] > + pub blockinfo: [VmaBlockInfo; VMA_BLOCKS_PER_EXTENT] > +} > + > +#[derive(Clone)] > +struct VmaBlockIndexEntry { > + pub cluster_file_offset: u64, > + pub mask: u16 > +} > + > +pub struct VmaReader { > + vma_file: File, > + vma_header: VmaHeader, > + configs: HashMap, > + block_index: [Vec; 256], IMO this should be either a `Vec>` with checked `.get()` calls, or at least `Box<>` the array, as `VmaRead` is now quite a huge thing on the stack. Alternatively the whole `VmaReader` could be boxed. > + blocks_are_indexed: bool > +} > + > +impl VmaReader { > + pub fn new(vma_file_path: &str) -> Self { > + let mut vma_file = match File::open(vma_file_path) { > + Err(why) => panic!("couldn't open {}: {}", vma_file_path, why), > + Ok(file) => file, > + }; > + > + let vma_header = Self::read_header(&mut vma_file).unwrap(); > + let configs = Self::read_blob_buffer(&mut vma_file, &vma_header).unwrap(); > + let block_index: [Vec; 256] = array_init(|_| Vec::new()); > + > + let instance = Self { > + vma_file, > + vma_header, > + configs, > + block_index, > + blocks_are_indexed: false > + }; > + > + return instance; > + } > + > + fn read_header(vma_file: &mut File) -> result::Result> { > + let mut buffer = vec![0u8; size_of::()]; Better to `Vec::with_capacity()` and `.resize(size, 0)`. The above code will create a temporary `VmaHeader`-sized zero-buffer on the stack before moving it. In debug builds this can even be notice-ably slow. At least in older rust versions building large [0u8; N] could take a visible amount of time (depending on N even seconds at times...) > + vma_file.read_exact(&mut buffer)?; > + > + let bincode_options = bincode::DefaultOptions::new() > + .with_fixint_encoding() > + .with_big_endian(); > + > + let vma_header: VmaHeader = bincode_options.deserialize(&buffer)?; > + > + if vma_header.magic != [b'V', b'M', b'A', 0] { > + return Err("Invalid magic number".into()); > + } > + > + if vma_header.version != 1 { > + return Err(format!("Invalid VMA version {}", vma_header.version).into()); > + } > + > + buffer.resize(vma_header.header_size as usize, 0); > + vma_file.read_exact(&mut buffer[size_of::()..])?; > + > + buffer[32..48].fill(0); Might be worth mentioning that this is the md5 sum in the struct. In fact, a helper casting a buffer to `&mut VmaHeader` and using the actual member access to zero it out might be nice, but not necessary since these are never going to "move". So feel free to just add a comment instead, but a comment is definitely needed ;-) > + let computed_md5sum: [u8; 16] = md5::compute(&buffer).into(); > + > + if vma_header.md5sum != computed_md5sum { > + return Err("Wrong VMA header checksum".into()); > + } > + > + return Ok(vma_header); > + } > + > + fn read_string_from_file(vma_file: &mut File, file_offset: u64) -> result::Result> { > + let mut size_bytes = [0u8; 2]; > + vma_file.seek(SeekFrom::Start(file_offset))?; > + vma_file.read_exact(&mut size_bytes)?; > + let size = u16::from_le_bytes(size_bytes) as usize; > + let mut string_bytes = vec![0u8; size - 1]; > + vma_file.read_exact(&mut string_bytes)?; > + let string = str::from_utf8(&string_bytes)?; > + > + return Ok(string.to_string()); > + } > + > + fn read_blob_buffer(vma_file: &mut File, vma_header: &VmaHeader) -> result::Result, Box> { > + let mut configs = HashMap::new(); > + > + for i in 0..VMA_MAX_CONFIGS { > + let config_name_offset = vma_header.config_names[i]; > + let config_data_offset = vma_header.config_data[i]; > + > + if config_name_offset == 0 || config_data_offset == 0 { > + continue; > + } > + > + let config_name_file_offset = (vma_header.blob_buffer_offset + config_name_offset) as u64; > + let config_data_file_offset = (vma_header.blob_buffer_offset + config_data_offset) as u64; > + let config_name = Self::read_string_from_file(vma_file, config_name_file_offset)?; > + let config_data = Self::read_string_from_file(vma_file, config_data_file_offset)?; > + > + configs.insert(String::from(config_name), String::from(config_data)); > + } > + > + return Ok(configs); > + } > + > + pub fn get_configs(&self) -> HashMap { > + return self.configs.clone(); > + } > + > + pub fn get_device_name(&mut self, device_id: u8) -> Option { ^ API-wise I don't think it makes much sense to encode the 256 limit into the type here. You can get rid of a *lot* of `as usize` casts if this is just an usize itself. But of course, you'd need to validate the value then, but at least it can then be explicitly tied to a named MAX_* constant to explain the limit. > + let device_name_offset = self.vma_header.dev_info[device_id as usize].device_name_offset; > + > + if device_name_offset == 0 { > + return None; > + } > + > + let device_name_file_offset = (self.vma_header.blob_buffer_offset + device_name_offset) as u64; > + let device_name = Self::read_string_from_file(&mut self.vma_file, device_name_file_offset).unwrap(); > + > + return Some(device_name.to_string()); > + } > + > + pub fn get_device_size(&self, device_id: u8) -> Option { Much less typing if you first let dev_info = &self.vma_header.dev_info[device_id as usize]; and not duplicate the `self.vma_header.dev_info[device_id as usize]` bit... > + if self.vma_header.dev_info[device_id as usize].device_name_offset == 0 { > + return None; > + } > + > + return Some(self.vma_header.dev_info[device_id as usize].device_size); > + } > + > + fn read_extent_header(vma_file: &mut File) -> result::Result> { > + let mut buffer = vec![0u8; size_of::()]; Again, please manage the buffer as mentioned above in `read_header`. > + vma_file.read_exact(&mut buffer)?; > + > + let bincode_options = bincode::DefaultOptions::new() > + .with_fixint_encoding() > + .with_big_endian(); > + > + let vma_extent_header: VmaExtentHeader = bincode_options.deserialize(&buffer)?; > + > + if vma_extent_header.magic != [b'V', b'M', b'A', b'E'] { > + return Err("Invalid magic number".into()); > + } > + > + buffer[24..40].fill(0); ^ Same here, comment needed > + let computed_md5sum: [u8; 16] = md5::compute(&buffer).into(); > + > + if vma_extent_header.md5sum != computed_md5sum { > + return Err("Wrong VMA extent header checksum".into()); > + } > + > + return Ok(vma_extent_header); > + } > + > + fn index_device_clusters(&mut self) -> result::Result<(), Box> { > + for device_id in 0..255 { > + let device_size = match self.get_device_size(device_id) { > + Some(x) => { x } > + None => { continue; } > + }; > + > + let device_cluster_count = (device_size + 4096 * 16 - 1) / (4096 * 16); > + > + let block_index_entry_placeholder = VmaBlockIndexEntry { > + cluster_file_offset: 0, > + mask: 0 > + }; > + > + self.block_index[device_id as usize].resize(device_cluster_count as usize, block_index_entry_placeholder); > + } > + > + let mut file_offset = self.vma_header.header_size as u64; > + let vma_file_size = self.vma_file.metadata()?.len(); > + > + while file_offset < vma_file_size { > + self.vma_file.seek(SeekFrom::Start(file_offset))?; > + let vma_extent_header = Self::read_extent_header(&mut self.vma_file)?; > + file_offset += 512; ^ Should be able to use `size_of::()` here, no? > + > + for i in 0..59 as usize { ^ Should use VMA_BLOCKS_PER_EXTENT instead of 59. > + let blockinfo = &vma_extent_header.blockinfo[i]; > + > + if blockinfo.dev_id == 0 { > + continue; > + } > + > + let block_index_entry = VmaBlockIndexEntry { > + cluster_file_offset: file_offset, > + mask: blockinfo.mask > + }; > + > + self.block_index[blockinfo.dev_id as usize][blockinfo.cluster_num as usize] = block_index_entry; > + file_offset += blockinfo.mask.count_ones() as u64 * 4096; > + } > + } > + > + return Ok(()); > + } > + > + pub fn read_device_contents(&mut self, device_id: u8, buffer: &mut [u8], offset: u64) -> result::Result> { > + assert_eq!(offset % (4096 * 16), 0); And another reason for anyhow over `panic!` based error handling: I'm not sure the above is supposed to be checked on release builds as well? > + > + // Make sure that the device clusters are already indexed > + if !self.blocks_are_indexed { > + self.index_device_clusters()?; > + } > + > + let this_device_block_index = &self.block_index[device_id as usize]; > + let length = cmp::min(buffer.len(), this_device_block_index.len() * 4096 * 16 - offset as usize); > + let mut buffer_offset = 0; > + let mut buffer_is_zero = true; > + > + while buffer_offset < length { > + let block_index_entry = &this_device_block_index[(offset as usize + buffer_offset) / (4096 * 16)]; > + self.vma_file.seek(SeekFrom::Start(block_index_entry.cluster_file_offset))?; > + > + for i in 0..16 { > + let current_block_mask = ((block_index_entry.mask >> i) & 1) == 1; > + > + if current_block_mask { > + self.vma_file.read_exact(&mut buffer[buffer_offset..(buffer_offset + 4096)])?; it's possible to have a device that isn't exactly a multiple of 4096, meaning its very final block may be smaller, so the device size needs to always be taken into account when reading such blocks. You can produce such an image via `pvesm alloc` as it takes a multiple of `1024` for its size, eg: # pvesm alloc local 100 vm-100-bad.raw $[(1<<20) + 1] Will produce a disk that is 1G + 1024 bytes. Now check what your tool does with the file - while the final blocks are still zero - then fill those blocks with data and check again. > + buffer_is_zero = false; > + } else { > + buffer[buffer_offset..(buffer_offset + 4096)].fill(0); > + } > + > + buffer_offset += 4096; > + } > + } > + > + return Ok(buffer_is_zero); > + } > +}