From: "Fabian Grünbichler" <f.gruenbichler@proxmox.com>
To: Filip Schauer <f.schauer@proxmox.com>, pbs-devel@lists.proxmox.com
Subject: Re: [pbs-devel] [PATCH vma-to-pbs v3 2/6] add support for bulk import of a dump directory
Date: Tue, 29 Oct 2024 11:41:16 +0100 [thread overview]
Message-ID: <173019847665.51916.11937343340126733147@yuna.proxmox.com> (raw)
In-Reply-To: <20241022142843.142191-3-f.schauer@proxmox.com>
Quoting Filip Schauer (2024-10-22 16:28:39)
> When a path to a directory is provided in the vma_file argument, try to
> upload all VMA backups in the directory. This also handles compressed
> VMA files, notes and logs. If a vmid is specified with --vmid, only the
> backups of that particular vmid are uploaded.
>
> This is intended for use on a dump directory:
>
> PBS_FINGERPRINT='PBS_FINGERPRINT' vma-to-pbs \
> --repository 'user@realm!token@server:port:datastore' \
> /var/lib/vz/dump
>
> Signed-off-by: Filip Schauer <f.schauer@proxmox.com>
> ---
> Cargo.toml | 3 ++
> src/main.rs | 120 ++++++++++++++++++++++++++++++++++++++++++++-----
> src/vma2pbs.rs | 64 +++++++++++++++++++++++---
> 3 files changed, 169 insertions(+), 18 deletions(-)
>
> diff --git a/Cargo.toml b/Cargo.toml
> index cd13426..5c6a175 100644
> --- a/Cargo.toml
> +++ b/Cargo.toml
> @@ -7,9 +7,12 @@ edition = "2021"
> [dependencies]
> anyhow = "1.0"
> bincode = "1.3"
> +chrono = "0.4"
> hyper = "0.14.5"
> +itertools = "0.13"
> pico-args = "0.5"
> md5 = "0.7.0"
> +regex = "1.7"
> scopeguard = "1.1.0"
> serde = "1.0"
> serde_json = "1.0"
> diff --git a/src/main.rs b/src/main.rs
> index 3e25591..83c59f6 100644
> --- a/src/main.rs
> +++ b/src/main.rs
> @@ -1,26 +1,33 @@
> use std::ffi::OsString;
> +use std::fs::read_dir;
> +use std::path::PathBuf;
>
> use anyhow::{bail, Context, Error};
> +use chrono::NaiveDateTime;
> +use itertools::Itertools;
> use proxmox_sys::linux::tty;
> use proxmox_time::epoch_i64;
> +use regex::Regex;
>
> mod vma;
> mod vma2pbs;
> -use vma2pbs::{vma2pbs, BackupVmaToPbsArgs, PbsArgs, VmaBackupArgs};
> +use vma2pbs::{vma2pbs, BackupVmaToPbsArgs, Compression, PbsArgs, VmaBackupArgs};
>
> const CMD_HELP: &str = "\
> Usage: vma-to-pbs [OPTIONS] --repository <auth_id@host:port:datastore> --vmid <VMID> [vma_file]
>
> Arguments:
> - [vma_file]
> + [vma_file | dump_directory]
>
> Options:
> --repository <auth_id@host:port:datastore>
> Repository URL
> [--ns <NAMESPACE>]
> Namespace
> - --vmid <VMID>
> + [--vmid <VMID>]
> Backup ID
> + If not specified, bulk import all VMA backups in the provided directory.
> + If specified with a dump directory, only import backups of the specified vmid.
this is missing the previous case that is still there - if a vma_file is given, then vmid is required.
> [--backup-time <EPOCH>]
> Backup timestamp
> --fingerprint <FINGERPRINT>
> @@ -87,7 +94,7 @@ fn parse_args() -> Result<BackupVmaToPbsArgs, Error> {
>
> let pbs_repository = args.value_from_str("--repository")?;
> let namespace = args.opt_value_from_str("--ns")?;
> - let vmid = args.value_from_str("--vmid")?;
> + let vmid = args.opt_value_from_str("--vmid")?;
> let backup_time: Option<i64> = args.opt_value_from_str("--backup-time")?;
> let backup_time = backup_time.unwrap_or_else(epoch_i64);
> let fingerprint = args.opt_value_from_str("--fingerprint")?;
> @@ -196,15 +203,106 @@ fn parse_args() -> Result<BackupVmaToPbsArgs, Error> {
> encrypt,
> };
>
> - let vma_args = VmaBackupArgs {
> - vma_file_path: vma_file_path.cloned(),
> - backup_id: vmid,
> - backup_time,
> - notes,
> - log_file_path,
> + let mut grouped_vmas = Vec::new();
I think this would be better off as a map? that would also allow constructing
it in place below..
> +
> + let bulk = if let Some(vma_file_path) = vma_file_path {
> + PathBuf::from(vma_file_path).is_dir()
> + } else {
> + false
> };
this is vma_file_path.is_some_and(|path| PathBuf::from(path).is_dir()) , but see below
>
> - let options = BackupVmaToPbsArgs { pbs_args, vma_args };
> + if bulk {
since bulk is only used here,
> + let dump_dir_path =
> + PathBuf::from(vma_file_path.expect("no directory specified for bulk import"));
and this does another conversion to a PathBuf, this could just be
let bulk =
vma_file_path
.map(PathBuf::from)
.and_then(|path| if path.is_dir() { Some(path) } else { None });
let grouped_vmas = if let Some(dump_dir_path) = bulk {
let re = Regex::new(
although I am not sure whether the possible cases here
- optional vmid, vma_file_path is set and a dir (the new bulk import)
- vmid is set, optional vma_file_path (the old single file or stdin import)
- no vmid set, vma_file_path not set or not a dir (invalid combination of options)
are not an argument for splitting single file and bulk import into separate
commands, or adding an explicit --bulk bool argument that would force a user to
opt into bulk importing?
> +
> + let re = Regex::new(
> + r"vzdump-qemu-(\d+)-(\d{4}_\d{2}_\d{2}-\d{2}_\d{2}_\d{2}).vma(|.zst|.lzo|.gz)$",
> + )?;
> +
> + let mut vmas = Vec::new();
> +
> + for entry in read_dir(dump_dir_path)? {
> + let entry = entry?;
> + let path = entry.path();
> +
> + if !path.is_file() {
> + continue;
> + }
> +
> + if let Some(file_name) = path.file_name().and_then(|n| n.to_str()) {
if we find a file that is not valid UTF-8, should we warn about it?
> + let Some((_, [backup_id, timestr, ext])) =
> + re.captures(file_name).map(|c| c.extract())
> + else {
> + // Skip the file, since it is not a VMA backup
> + continue;
> + };
> +
> + if let Some(ref vmid) = vmid {
> + if backup_id != vmid {
> + // Skip the backup, since it does not match the specified vmid
> + continue;
should we log these as well?
> + }
> + }
> +
> + let compression = match ext {
> + "" => None,
> + ".zst" => Some(Compression::Zstd),
> + ".lzo" => Some(Compression::Lzo),
> + ".gz" => Some(Compression::GZip),
> + _ => continue,
and these?
> + };
> +
> + let backup_time = NaiveDateTime::parse_from_str(timestr, "%Y_%m_%d-%H_%M_%S")?
> + .and_utc()
> + .timestamp();
> +
> + let notes_path = path.with_file_name(format!("{}.notes", file_name));
> + let notes = proxmox_sys::fs::file_read_optional_string(notes_path)?;
> +
> + let log_path = path.with_file_name(format!("{}.log", file_name));
> + let log_file_path = if log_path.exists() {
> + Some(log_path.to_path_buf().into_os_string())
> + } else {
> + None
> + };
and finally here all that are found and will be processed? ;)
> +
> + let backup_args = VmaBackupArgs {
> + vma_file_path: Some(path.clone().into()),
> + compression,
> + backup_id: backup_id.to_string(),
> + backup_time,
> + notes,
> + log_file_path,
> + };
> + vmas.push(backup_args);
> + }
> + }
> +
> + vmas.sort_by_key(|d| d.backup_time);
> + grouped_vmas = vmas
> + .into_iter()
> + .into_group_map_by(|d| d.backup_id.clone())
> + .into_values()
> + .collect();
> + grouped_vmas.sort_by_key(|d| d[0].backup_id.clone());
or here, if we just want to print a summary for the bulk imports like:
Found backup archives: {total_count}
Summary:
- VMID XX: {count}
- VMID XY: {count}
- ..
could even ask for confirmation before starting the bulk import?
> + } else if let Some(vmid) = vmid {
> + let vmas = vec![VmaBackupArgs {
> + vma_file_path: vma_file_path.cloned(),
> + compression: None,
> + backup_id: vmid,
> + backup_time,
> + notes,
> + log_file_path,
> + }];
> + grouped_vmas.push(vmas);
> + } else {
> + bail!("No vmid specified for single backup file");
> + }
> +
> + let options = BackupVmaToPbsArgs {
> + pbs_args,
> + grouped_vmas,
> + };
>
> Ok(options)
> }
> diff --git a/src/vma2pbs.rs b/src/vma2pbs.rs
> index 37ea308..05294f3 100644
> --- a/src/vma2pbs.rs
> +++ b/src/vma2pbs.rs
> @@ -4,6 +4,7 @@ use std::collections::HashMap;
> use std::ffi::{c_char, CStr, CString, OsString};
> use std::fs::File;
> use std::io::{stdin, BufRead, BufReader, Read};
> +use std::process::{Command, Stdio};
> use std::ptr;
> use std::time::SystemTime;
>
> @@ -30,7 +31,7 @@ const VMA_CLUSTER_SIZE: usize = 65536;
>
> pub struct BackupVmaToPbsArgs {
> pub pbs_args: PbsArgs,
> - pub vma_args: VmaBackupArgs,
> + pub grouped_vmas: Vec<Vec<VmaBackupArgs>>,
> }
>
> pub struct PbsArgs {
> @@ -45,8 +46,15 @@ pub struct PbsArgs {
> pub encrypt: bool,
> }
>
> +pub enum Compression {
> + Zstd,
> + Lzo,
> + GZip,
> +}
> +
> pub struct VmaBackupArgs {
> pub vma_file_path: Option<OsString>,
> + pub compression: Option<Compression>,
> pub backup_id: String,
> pub backup_time: i64,
> pub notes: Option<String>,
> @@ -468,7 +476,19 @@ pub fn vma2pbs(args: BackupVmaToPbsArgs) -> Result<(), Error> {
>
> let start_transfer_time = SystemTime::now();
>
> - upload_vma_file(pbs_args, &args.vma_args)?;
> + for vma_group in args.grouped_vmas {
> + for backup_args in vma_group {
> + if let Err(e) = upload_vma_file(pbs_args, &backup_args) {
> + eprintln!(
> + "Failed to upload vma file at {:?} - {}",
> + backup_args.vma_file_path.expect("missing VMA file path"),
if this is None it's a backup from stdin, and shouldn't print this confusing
error message? (ah, that happens in the next patch, but should be folded in
here ;))
> + e
> + );
> + println!("Skipping VMID {}", backup_args.backup_id);
> + break;
> + }
> + }
> + }
>
> let transfer_duration = SystemTime::now().duration_since(start_transfer_time)?;
> let total_seconds = transfer_duration.as_secs();
> @@ -481,13 +501,43 @@ pub fn vma2pbs(args: BackupVmaToPbsArgs) -> Result<(), Error> {
> }
>
> fn upload_vma_file(pbs_args: &PbsArgs, backup_args: &VmaBackupArgs) -> Result<(), Error> {
> - let vma_file: Box<dyn BufRead> = match &backup_args.vma_file_path {
> - Some(vma_file_path) => match File::open(vma_file_path) {
> - Err(why) => return Err(anyhow!("Couldn't open file: {}", why)),
> - Ok(file) => Box::new(BufReader::new(file)),
> + match &backup_args.vma_file_path {
> + Some(vma_file_path) => println!("Uploading VMA backup from {:?}", vma_file_path),
> + None => println!("Uploading VMA backup from (stdin)"),
> + };
> +
> + let vma_file: Box<dyn BufRead> = match &backup_args.compression {
> + Some(compression) => {
> + let vma_file_path = backup_args
> + .vma_file_path
> + .as_ref()
> + .expect("No VMA file path provided");
> + let mut cmd = match compression {
> + Compression::Zstd => {
> + let mut cmd = Command::new("zstd");
> + cmd.args(["-q", "-d", "-c"]);
> + cmd
> + }
> + Compression::Lzo => {
> + let mut cmd = Command::new("lzop");
> + cmd.args(["-d", "-c"]);
> + cmd
> + }
> + Compression::GZip => Command::new("zcat"),
> + };
> + let process = cmd.arg(vma_file_path).stdout(Stdio::piped()).spawn()?;
> + let stdout = process.stdout.expect("Failed to capture stdout");
> + Box::new(BufReader::new(stdout))
did you test whether with_capacity and a higher buffer size than 8K improves performance?
> + }
> + None => match &backup_args.vma_file_path {
> + Some(vma_file_path) => match File::open(vma_file_path) {
> + Err(why) => return Err(anyhow!("Couldn't open file: {}", why)),
> + Ok(file) => Box::new(BufReader::new(file)),
same here
> + },
> + None => Box::new(BufReader::new(stdin())),
> },
> - None => Box::new(BufReader::new(stdin())),
> };
> +
> let vma_reader = VmaReader::new(vma_file)?;
>
> let pbs = create_pbs_backup_task(pbs_args, backup_args)?;
> --
> 2.39.5
>
>
>
> _______________________________________________
> pbs-devel mailing list
> pbs-devel@lists.proxmox.com
> https://lists.proxmox.com/cgi-bin/mailman/listinfo/pbs-devel
>
>
_______________________________________________
pbs-devel mailing list
pbs-devel@lists.proxmox.com
https://lists.proxmox.com/cgi-bin/mailman/listinfo/pbs-devel
next prev parent reply other threads:[~2024-10-29 10:41 UTC|newest]
Thread overview: 13+ messages / expand[flat|nested] mbox.gz Atom feed top
2024-10-22 14:28 [pbs-devel] [PATCH vma-to-pbs v3 0/6] " Filip Schauer
2024-10-22 14:28 ` [pbs-devel] [PATCH vma-to-pbs v3 1/6] split BackupVmaToPbsArgs into PbsArgs and VmaBackupArgs Filip Schauer
2024-10-22 14:28 ` [pbs-devel] [PATCH vma-to-pbs v3 2/6] add support for bulk import of a dump directory Filip Schauer
2024-10-29 10:41 ` Fabian Grünbichler [this message]
2024-10-30 13:59 ` Filip Schauer
2024-10-30 14:04 ` Fabian Grünbichler
2024-10-22 14:28 ` [pbs-devel] [PATCH vma-to-pbs v3 3/6] add option to skip vmids whose backups failed to upload Filip Schauer
2024-10-22 14:28 ` [pbs-devel] [PATCH vma-to-pbs v3 4/6] remove hard coded values Filip Schauer
2024-10-22 14:28 ` [pbs-devel] [PATCH vma-to-pbs v3 5/6] use level-based logging instead of println Filip Schauer
2024-10-22 14:28 ` [pbs-devel] [PATCH vma-to-pbs v3 6/6] log device upload progress as a percentage Filip Schauer
2024-10-29 10:39 ` Fabian Grünbichler
2024-10-29 10:51 ` [pbs-devel] [PATCH vma-to-pbs v3 0/6] add support for bulk import of a dump directory Fabian Grünbichler
2024-10-30 14:02 ` Filip Schauer
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=173019847665.51916.11937343340126733147@yuna.proxmox.com \
--to=f.gruenbichler@proxmox.com \
--cc=f.schauer@proxmox.com \
--cc=pbs-devel@lists.proxmox.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox