* [pbs-devel] [RFC PATCH backup-restore-image/proxmox-backup] improve file-restore for zpools @ 2022-10-31 11:39 Dominik Csapak 2022-10-31 11:39 ` [pbs-devel] [RFC PATCH backup-restore-image 1/1] add kernel config options for memory hotplug Dominik Csapak ` (2 more replies) 0 siblings, 3 replies; 11+ messages in thread From: Dominik Csapak @ 2022-10-31 11:39 UTC (permalink / raw) To: pbs-devel when using file-restore for guests with a zpool, the minimal memory amount given (up to 384 MiB depending on the disks) is often not enough, resulting in an oom kill of our restore daemon inside the vm the easiest way to trigger that is having many subvolumes in the pool (e.g. i could trigger it easily with ~100 subvolumes) to improve this, we dynamically add more (1GiB) when we see that the client tries to access a zpool with this, i successfully tested up to 1000 subvolumes (altough it took a few minutes to mount, not exactly sure why) for this to work, we have to adapt our custom kernel config a bit, so that we include the memory hotplug configs RFC because: * not sure if the value of 1GiB is sensible, i tried to give some amount that is not too much, but should fix a wide range of cases maybe we want to make that configurable? if we do, maybe we should discard this approach and simply make the general amount of memory configurable? * not sure about the hotplug mechanism, we could use virtio-mem instead, but in the current case i don't see any advantages (we have to predefine the maximum value anyway, and since we only have a few points where we update the memory, we don't need to do it in small increments, it also would require more kernel config flags) * i mostly ignore the QMP answers from qemu because they are mostly empty and hold no real information. we could of course build a complete qmp client in rust (i guess we need that sooner or later anyway) but depending on how exactly we want to parse qemu messages this will increase the amount of work significantly Note that when we want to bump the file-restore package, we still need to apply my previous patches for pve-common/storage [0] patch 1/2 of proxmox-backup is a simple cleanup that could be applied individually 0: https://lists.proxmox.com/pipermail/pve-devel/2022-May/053085.html proxmox-backup-restore-image: Dominik Csapak (1): add kernel config options for memory hotplug src/config-base | 6 ++++++ 1 file changed, 6 insertions(+) proxmox-backup: Dominik Csapak (2): file-restore: fix deprecated qemu parameters file-restore: dynamically increase memory of vm for zpools proxmox-file-restore/src/block_driver_qemu.rs | 45 +++++++++-- proxmox-file-restore/src/qemu_helper.rs | 74 ++++++++++++++++++- 2 files changed, 108 insertions(+), 11 deletions(-) -- 2.30.2 ^ permalink raw reply [flat|nested] 11+ messages in thread
* [pbs-devel] [RFC PATCH backup-restore-image 1/1] add kernel config options for memory hotplug 2022-10-31 11:39 [pbs-devel] [RFC PATCH backup-restore-image/proxmox-backup] improve file-restore for zpools Dominik Csapak @ 2022-10-31 11:39 ` Dominik Csapak 2022-11-07 12:48 ` [pbs-devel] applied: " Wolfgang Bumiller 2022-10-31 11:39 ` [pbs-devel] [PATCH proxmox-backup 1/2] file-restore: fix deprecated qemu parameters Dominik Csapak 2022-10-31 11:39 ` [pbs-devel] [RFC PATCH proxmox-backup 2/2] file-restore: dynamically increase memory of vm for zpools Dominik Csapak 2 siblings, 1 reply; 11+ messages in thread From: Dominik Csapak @ 2022-10-31 11:39 UTC (permalink / raw) To: pbs-devel so that we can hotplug memory in the guest Signed-off-by: Dominik Csapak <d.csapak@proxmox.com> --- src/config-base | 6 ++++++ 1 file changed, 6 insertions(+) diff --git a/src/config-base b/src/config-base index e9af0ba..0f4f06d 100644 --- a/src/config-base +++ b/src/config-base @@ -144,3 +144,9 @@ CONFIG_ISO9660_FS=y CONFIG_NTFS_FS=y CONFIG_MSDOS_FS=y CONFIG_VFAT_FS=y + +# memory hotplug +CONFIG_MEMORY_HOTPLUG=y +CONFIG_MEMORY_HOTPLUG_DEFAULT_ONLINE=y +CONFIG_MEMORY_HOTREMOVE=y +CONFIG_ACPI_HOTPLUG_MEMORY=y -- 2.30.2 ^ permalink raw reply [flat|nested] 11+ messages in thread
* [pbs-devel] applied: [RFC PATCH backup-restore-image 1/1] add kernel config options for memory hotplug 2022-10-31 11:39 ` [pbs-devel] [RFC PATCH backup-restore-image 1/1] add kernel config options for memory hotplug Dominik Csapak @ 2022-11-07 12:48 ` Wolfgang Bumiller 0 siblings, 0 replies; 11+ messages in thread From: Wolfgang Bumiller @ 2022-11-07 12:48 UTC (permalink / raw) To: Dominik Csapak; +Cc: pbs-devel applied, thanks On Mon, Oct 31, 2022 at 12:39:51PM +0100, Dominik Csapak wrote: > so that we can hotplug memory in the guest > > Signed-off-by: Dominik Csapak <d.csapak@proxmox.com> > --- > src/config-base | 6 ++++++ > 1 file changed, 6 insertions(+) > > diff --git a/src/config-base b/src/config-base > index e9af0ba..0f4f06d 100644 > --- a/src/config-base > +++ b/src/config-base > @@ -144,3 +144,9 @@ CONFIG_ISO9660_FS=y > CONFIG_NTFS_FS=y > CONFIG_MSDOS_FS=y > CONFIG_VFAT_FS=y > + > +# memory hotplug > +CONFIG_MEMORY_HOTPLUG=y > +CONFIG_MEMORY_HOTPLUG_DEFAULT_ONLINE=y > +CONFIG_MEMORY_HOTREMOVE=y > +CONFIG_ACPI_HOTPLUG_MEMORY=y > -- > 2.30.2 ^ permalink raw reply [flat|nested] 11+ messages in thread
* [pbs-devel] [PATCH proxmox-backup 1/2] file-restore: fix deprecated qemu parameters 2022-10-31 11:39 [pbs-devel] [RFC PATCH backup-restore-image/proxmox-backup] improve file-restore for zpools Dominik Csapak 2022-10-31 11:39 ` [pbs-devel] [RFC PATCH backup-restore-image 1/1] add kernel config options for memory hotplug Dominik Csapak @ 2022-10-31 11:39 ` Dominik Csapak 2022-11-04 12:29 ` [pbs-devel] applied: " Wolfgang Bumiller 2022-10-31 11:39 ` [pbs-devel] [RFC PATCH proxmox-backup 2/2] file-restore: dynamically increase memory of vm for zpools Dominik Csapak 2 siblings, 1 reply; 11+ messages in thread From: Dominik Csapak @ 2022-10-31 11:39 UTC (permalink / raw) To: pbs-devel server and nowait are deprecated, so we should use the longform: server=on and wait=off Signed-off-by: Dominik Csapak <d.csapak@proxmox.com> --- proxmox-file-restore/src/qemu_helper.rs | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/proxmox-file-restore/src/qemu_helper.rs b/proxmox-file-restore/src/qemu_helper.rs index fffa3bd9..d6f4c5a9 100644 --- a/proxmox-file-restore/src/qemu_helper.rs +++ b/proxmox-file-restore/src/qemu_helper.rs @@ -264,7 +264,7 @@ pub async fn start_vm( let debug_args = [ "-chardev", &format!( - "socket,id=debugser,path=/run/proxmox-backup/file-restore-serial-{}.sock,server,nowait", + "socket,id=debugser,path=/run/proxmox-backup/file-restore-serial-{}.sock,server=on,wait=off", cid ), "-serial", -- 2.30.2 ^ permalink raw reply [flat|nested] 11+ messages in thread
* [pbs-devel] applied: [PATCH proxmox-backup 1/2] file-restore: fix deprecated qemu parameters 2022-10-31 11:39 ` [pbs-devel] [PATCH proxmox-backup 1/2] file-restore: fix deprecated qemu parameters Dominik Csapak @ 2022-11-04 12:29 ` Wolfgang Bumiller 0 siblings, 0 replies; 11+ messages in thread From: Wolfgang Bumiller @ 2022-11-04 12:29 UTC (permalink / raw) To: Dominik Csapak; +Cc: pbs-devel applied this one, thanks On Mon, Oct 31, 2022 at 12:39:52PM +0100, Dominik Csapak wrote: > server and nowait are deprecated, so we should use the longform: > server=on and wait=off > > Signed-off-by: Dominik Csapak <d.csapak@proxmox.com> > --- > proxmox-file-restore/src/qemu_helper.rs | 2 +- > 1 file changed, 1 insertion(+), 1 deletion(-) > > diff --git a/proxmox-file-restore/src/qemu_helper.rs b/proxmox-file-restore/src/qemu_helper.rs > index fffa3bd9..d6f4c5a9 100644 > --- a/proxmox-file-restore/src/qemu_helper.rs > +++ b/proxmox-file-restore/src/qemu_helper.rs > @@ -264,7 +264,7 @@ pub async fn start_vm( > let debug_args = [ > "-chardev", > &format!( > - "socket,id=debugser,path=/run/proxmox-backup/file-restore-serial-{}.sock,server,nowait", > + "socket,id=debugser,path=/run/proxmox-backup/file-restore-serial-{}.sock,server=on,wait=off", > cid > ), > "-serial", > -- > 2.30.2 ^ permalink raw reply [flat|nested] 11+ messages in thread
* [pbs-devel] [RFC PATCH proxmox-backup 2/2] file-restore: dynamically increase memory of vm for zpools 2022-10-31 11:39 [pbs-devel] [RFC PATCH backup-restore-image/proxmox-backup] improve file-restore for zpools Dominik Csapak 2022-10-31 11:39 ` [pbs-devel] [RFC PATCH backup-restore-image 1/1] add kernel config options for memory hotplug Dominik Csapak 2022-10-31 11:39 ` [pbs-devel] [PATCH proxmox-backup 1/2] file-restore: fix deprecated qemu parameters Dominik Csapak @ 2022-10-31 11:39 ` Dominik Csapak 2022-11-07 12:35 ` [pbs-devel] applied: " Wolfgang Bumiller 2 siblings, 1 reply; 11+ messages in thread From: Dominik Csapak @ 2022-10-31 11:39 UTC (permalink / raw) To: pbs-devel when a backup contains a drive with zfs on it, the default memory size (up to 384 MiB) is often not enough to hold the zfs metadata to improve that situation, add memory dynamically (1GiB) when a path is requested that is on zfs. Note that the image must be started with a kernel capable of memory hotplug. to achieve that, we also have to add a qmp socket to the vm, so that we can later connect and add the memory backend and dimm Signed-off-by: Dominik Csapak <d.csapak@proxmox.com> --- proxmox-file-restore/src/block_driver_qemu.rs | 45 ++++++++++-- proxmox-file-restore/src/qemu_helper.rs | 72 ++++++++++++++++++- 2 files changed, 107 insertions(+), 10 deletions(-) diff --git a/proxmox-file-restore/src/block_driver_qemu.rs b/proxmox-file-restore/src/block_driver_qemu.rs index 736ae2fd..c2cd8d49 100644 --- a/proxmox-file-restore/src/block_driver_qemu.rs +++ b/proxmox-file-restore/src/block_driver_qemu.rs @@ -1,7 +1,10 @@ //! Block file access via a small QEMU restore VM using the PBS block driver in QEMU use std::collections::HashMap; +use std::ffi::OsStr; use std::fs::{File, OpenOptions}; use std::io::{prelude::*, BufReader, BufWriter, SeekFrom}; +use std::os::unix::prelude::OsStrExt; +use std::path::Path; use anyhow::{bail, Error}; use futures::FutureExt; @@ -16,6 +19,7 @@ use pbs_datastore::catalog::ArchiveEntry; use super::block_driver::*; use crate::get_user_run_dir; +use crate::qemu_helper::set_dynamic_memory; const RESTORE_VM_MAP: &str = "restore-vm-map.json"; @@ -119,7 +123,7 @@ fn new_ticket() -> String { proxmox_uuid::Uuid::generate().to_string() } -async fn ensure_running(details: &SnapRestoreDetails) -> Result<VsockClient, Error> { +async fn ensure_running(details: &SnapRestoreDetails) -> Result<(i32, VsockClient), Error> { let name = make_name(&details.repo, &details.namespace, &details.snapshot); let mut state = VMStateMap::load()?; @@ -133,7 +137,7 @@ async fn ensure_running(details: &SnapRestoreDetails) -> Result<VsockClient, Err match res { Ok(_) => { // VM is running and we just reset its timeout, nothing to do - return Ok(client); + return Ok((vm.cid, client)); } Err(err) => { log::warn!("stale VM detected, restarting ({})", err); @@ -170,13 +174,30 @@ async fn ensure_running(details: &SnapRestoreDetails) -> Result<VsockClient, Err }; state.write()?; - Ok(VsockClient::new( + Ok(( new_cid, - DEFAULT_VSOCK_PORT, - Some(vms.ticket), + VsockClient::new(new_cid, DEFAULT_VSOCK_PORT, Some(vms.ticket)), )) } +fn path_is_zfs(path: &[u8]) -> bool { + if path.is_empty() { + return false; + } + let path = Path::new(OsStr::from_bytes(path)); + let mut components = path.components(); + let part = match components.next() { + Some(std::path::Component::RootDir) => match components.next() { + Some(std::path::Component::Normal(comp)) => comp, + _ => return false, + }, + Some(std::path::Component::Normal(comp)) => comp, + _ => return false, + }; + + part == OsStr::new("zpool") && components.next().is_some() +} + async fn start_vm(cid_request: i32, details: &SnapRestoreDetails) -> Result<VMState, Error> { let ticket = new_ticket(); let files = details @@ -199,10 +220,15 @@ impl BlockRestoreDriver for QemuBlockDriver { mut path: Vec<u8>, ) -> Async<Result<Vec<ArchiveEntry>, Error>> { async move { - let client = ensure_running(&details).await?; + let (cid, client) = ensure_running(&details).await?; if !path.is_empty() && path[0] != b'/' { path.insert(0, b'/'); } + if path_is_zfs(&path) { + if let Err(err) = set_dynamic_memory(cid, None).await { + log::error!("could not increase memory: {err}"); + } + } let path = base64::encode(img_file.bytes().chain(path).collect::<Vec<u8>>()); let mut result = client .get("api2/json/list", Some(json!({ "path": path }))) @@ -221,10 +247,15 @@ impl BlockRestoreDriver for QemuBlockDriver { zstd: bool, ) -> Async<Result<Box<dyn tokio::io::AsyncRead + Unpin + Send>, Error>> { async move { - let client = ensure_running(&details).await?; + let (cid, client) = ensure_running(&details).await?; if !path.is_empty() && path[0] != b'/' { path.insert(0, b'/'); } + if path_is_zfs(&path) { + if let Err(err) = set_dynamic_memory(cid, None).await { + log::error!("could not increase memory: {err}"); + } + } let path = base64::encode(img_file.bytes().chain(path).collect::<Vec<u8>>()); let (mut tx, rx) = tokio::io::duplex(1024 * 4096); let mut data = json!({ "path": path, "zstd": zstd }); diff --git a/proxmox-file-restore/src/qemu_helper.rs b/proxmox-file-restore/src/qemu_helper.rs index d6f4c5a9..7216d351 100644 --- a/proxmox-file-restore/src/qemu_helper.rs +++ b/proxmox-file-restore/src/qemu_helper.rs @@ -6,7 +6,12 @@ use std::path::PathBuf; use std::time::{Duration, Instant}; use anyhow::{bail, format_err, Error}; -use tokio::time; +use serde_json::json; +use tokio::io::AsyncBufRead; +use tokio::{ + io::{AsyncBufReadExt, AsyncWrite, AsyncWriteExt}, + time, +}; use nix::sys::signal::{kill, Signal}; use nix::unistd::Pid; @@ -22,6 +27,8 @@ use crate::{backup_user, cpio}; const PBS_VM_NAME: &str = "pbs-restore-vm"; const MAX_CID_TRIES: u64 = 32; +const DYNAMIC_MEMORY_MB: usize = 1024; +const QMP_SOCKET_PREFIX: &str = "/run/proxmox-backup/file-restore-qmp-"; fn create_restore_log_dir() -> Result<String, Error> { let logpath = format!("{}/file-restore", pbs_buildcfg::PROXMOX_BACKUP_LOG_DIR); @@ -121,6 +128,54 @@ async fn create_temp_initramfs(ticket: &str, debug: bool) -> Result<(File, Strin Ok((tmp_file, path)) } +async fn send_qmp_request<T: AsyncBufRead + AsyncWrite + Unpin>( + stream: &mut T, + request: &str, +) -> Result<String, Error> { + stream.write_all(request.as_bytes()).await?; + stream.flush().await?; + let mut buf = String::new(); + let _ = stream.read_line(&mut buf).await?; + Ok(buf) +} + +pub async fn set_dynamic_memory(cid: i32, target_memory: Option<usize>) -> Result<(), Error> { + let target_memory = match target_memory { + Some(size) if size > DYNAMIC_MEMORY_MB => { + bail!("cannot set to {}M, maximum is {}M", size, DYNAMIC_MEMORY_MB) + } + Some(size) => size, + None => DYNAMIC_MEMORY_MB, + }; + + let path = format!("{}{}.sock", QMP_SOCKET_PREFIX, cid); + let mut stream = tokio::io::BufStream::new(tokio::net::UnixStream::connect(path).await?); + + let _ = stream.read_line(&mut String::new()).await?; // initial qmp message + let _ = send_qmp_request(&mut stream, "{\"execute\":\"qmp_capabilities\"}\n").await?; + + let request = json!({ + "execute": "object-add", + "arguments": { + "qom-type": "memory-backend-ram", + "id": "mem0", + "size": target_memory * 1024 * 1024, + } + }); + let _ = send_qmp_request(&mut stream, &serde_json::to_string(&request)?).await?; + let request = json!({ + "execute": "device_add", + "arguments": { + "driver": "pc-dimm", + "id": "dimm0", + "memdev": "mem0", + } + }); + let _ = send_qmp_request(&mut stream, &serde_json::to_string(&request)?).await?; + + Ok(()) +} + pub async fn start_vm( // u16 so we can do wrapping_add without going too high mut cid: u16, @@ -185,7 +240,8 @@ pub async fn start_vm( &ramfs_path, "-append", &format!( - "{} panic=1 zfs_arc_min=0 zfs_arc_max=0", + "{} panic=1 zfs_arc_min=0 zfs_arc_max=0 memhp_default_state=online_kernel +", if debug { "debug" } else { "quiet" } ), "-daemonize", @@ -252,13 +308,23 @@ pub async fn start_vm( let mut qemu_cmd = std::process::Command::new("qemu-system-x86_64"); qemu_cmd.args(base_args.iter()); qemu_cmd.arg("-m"); - qemu_cmd.arg(ram.to_string()); + qemu_cmd.arg(format!( + "{ram}M,slots=1,maxmem={}M", + DYNAMIC_MEMORY_MB + ram + )); qemu_cmd.args(&drives); qemu_cmd.arg("-device"); qemu_cmd.arg(format!( "vhost-vsock-pci,guest-cid={},disable-legacy=on", cid )); + qemu_cmd.arg("-chardev"); + qemu_cmd.arg(format!( + "socket,id=qmp,path={}{}.sock,server=on,wait=off", + QMP_SOCKET_PREFIX, cid + )); + qemu_cmd.arg("-mon"); + qemu_cmd.arg("chardev=qmp,mode=control"); if debug { let debug_args = [ -- 2.30.2 ^ permalink raw reply [flat|nested] 11+ messages in thread
* [pbs-devel] applied: [RFC PATCH proxmox-backup 2/2] file-restore: dynamically increase memory of vm for zpools 2022-10-31 11:39 ` [pbs-devel] [RFC PATCH proxmox-backup 2/2] file-restore: dynamically increase memory of vm for zpools Dominik Csapak @ 2022-11-07 12:35 ` Wolfgang Bumiller 2022-11-07 17:15 ` Thomas Lamprecht 0 siblings, 1 reply; 11+ messages in thread From: Wolfgang Bumiller @ 2022-11-07 12:35 UTC (permalink / raw) To: Dominik Csapak; +Cc: pbs-devel applied AFAICT if the kernel patch is not applied it'll simply have no effect anyway, so we shouldn't need any "hard dependency bumps" where otherweise things break? On Mon, Oct 31, 2022 at 12:39:53PM +0100, Dominik Csapak wrote: > when a backup contains a drive with zfs on it, the default memory > size (up to 384 MiB) is often not enough to hold the zfs metadata > > to improve that situation, add memory dynamically (1GiB) when a path is > requested that is on zfs. Note that the image must be started with a > kernel capable of memory hotplug. > > to achieve that, we also have to add a qmp socket to the vm, so that > we can later connect and add the memory backend and dimm > > Signed-off-by: Dominik Csapak <d.csapak@proxmox.com> > --- > proxmox-file-restore/src/block_driver_qemu.rs | 45 ++++++++++-- > proxmox-file-restore/src/qemu_helper.rs | 72 ++++++++++++++++++- > 2 files changed, 107 insertions(+), 10 deletions(-) > > diff --git a/proxmox-file-restore/src/block_driver_qemu.rs b/proxmox-file-restore/src/block_driver_qemu.rs > index 736ae2fd..c2cd8d49 100644 > --- a/proxmox-file-restore/src/block_driver_qemu.rs > +++ b/proxmox-file-restore/src/block_driver_qemu.rs > @@ -1,7 +1,10 @@ > //! Block file access via a small QEMU restore VM using the PBS block driver in QEMU > use std::collections::HashMap; > +use std::ffi::OsStr; > use std::fs::{File, OpenOptions}; > use std::io::{prelude::*, BufReader, BufWriter, SeekFrom}; > +use std::os::unix::prelude::OsStrExt; > +use std::path::Path; > > use anyhow::{bail, Error}; > use futures::FutureExt; > @@ -16,6 +19,7 @@ use pbs_datastore::catalog::ArchiveEntry; > > use super::block_driver::*; > use crate::get_user_run_dir; > +use crate::qemu_helper::set_dynamic_memory; > > const RESTORE_VM_MAP: &str = "restore-vm-map.json"; > > @@ -119,7 +123,7 @@ fn new_ticket() -> String { > proxmox_uuid::Uuid::generate().to_string() > } > > -async fn ensure_running(details: &SnapRestoreDetails) -> Result<VsockClient, Error> { > +async fn ensure_running(details: &SnapRestoreDetails) -> Result<(i32, VsockClient), Error> { > let name = make_name(&details.repo, &details.namespace, &details.snapshot); > let mut state = VMStateMap::load()?; > > @@ -133,7 +137,7 @@ async fn ensure_running(details: &SnapRestoreDetails) -> Result<VsockClient, Err > match res { > Ok(_) => { > // VM is running and we just reset its timeout, nothing to do > - return Ok(client); > + return Ok((vm.cid, client)); > } > Err(err) => { > log::warn!("stale VM detected, restarting ({})", err); > @@ -170,13 +174,30 @@ async fn ensure_running(details: &SnapRestoreDetails) -> Result<VsockClient, Err > }; > > state.write()?; > - Ok(VsockClient::new( > + Ok(( > new_cid, > - DEFAULT_VSOCK_PORT, > - Some(vms.ticket), > + VsockClient::new(new_cid, DEFAULT_VSOCK_PORT, Some(vms.ticket)), > )) > } > > +fn path_is_zfs(path: &[u8]) -> bool { > + if path.is_empty() { > + return false; > + } > + let path = Path::new(OsStr::from_bytes(path)); > + let mut components = path.components(); > + let part = match components.next() { > + Some(std::path::Component::RootDir) => match components.next() { > + Some(std::path::Component::Normal(comp)) => comp, > + _ => return false, > + }, > + Some(std::path::Component::Normal(comp)) => comp, > + _ => return false, > + }; > + > + part == OsStr::new("zpool") && components.next().is_some() > +} > + > async fn start_vm(cid_request: i32, details: &SnapRestoreDetails) -> Result<VMState, Error> { > let ticket = new_ticket(); > let files = details > @@ -199,10 +220,15 @@ impl BlockRestoreDriver for QemuBlockDriver { > mut path: Vec<u8>, > ) -> Async<Result<Vec<ArchiveEntry>, Error>> { > async move { > - let client = ensure_running(&details).await?; > + let (cid, client) = ensure_running(&details).await?; > if !path.is_empty() && path[0] != b'/' { > path.insert(0, b'/'); > } > + if path_is_zfs(&path) { > + if let Err(err) = set_dynamic_memory(cid, None).await { > + log::error!("could not increase memory: {err}"); > + } > + } > let path = base64::encode(img_file.bytes().chain(path).collect::<Vec<u8>>()); > let mut result = client > .get("api2/json/list", Some(json!({ "path": path }))) > @@ -221,10 +247,15 @@ impl BlockRestoreDriver for QemuBlockDriver { > zstd: bool, > ) -> Async<Result<Box<dyn tokio::io::AsyncRead + Unpin + Send>, Error>> { > async move { > - let client = ensure_running(&details).await?; > + let (cid, client) = ensure_running(&details).await?; > if !path.is_empty() && path[0] != b'/' { > path.insert(0, b'/'); > } > + if path_is_zfs(&path) { > + if let Err(err) = set_dynamic_memory(cid, None).await { > + log::error!("could not increase memory: {err}"); > + } > + } > let path = base64::encode(img_file.bytes().chain(path).collect::<Vec<u8>>()); > let (mut tx, rx) = tokio::io::duplex(1024 * 4096); > let mut data = json!({ "path": path, "zstd": zstd }); > diff --git a/proxmox-file-restore/src/qemu_helper.rs b/proxmox-file-restore/src/qemu_helper.rs > index d6f4c5a9..7216d351 100644 > --- a/proxmox-file-restore/src/qemu_helper.rs > +++ b/proxmox-file-restore/src/qemu_helper.rs > @@ -6,7 +6,12 @@ use std::path::PathBuf; > use std::time::{Duration, Instant}; > > use anyhow::{bail, format_err, Error}; > -use tokio::time; > +use serde_json::json; > +use tokio::io::AsyncBufRead; > +use tokio::{ > + io::{AsyncBufReadExt, AsyncWrite, AsyncWriteExt}, > + time, > +}; > > use nix::sys::signal::{kill, Signal}; > use nix::unistd::Pid; > @@ -22,6 +27,8 @@ use crate::{backup_user, cpio}; > > const PBS_VM_NAME: &str = "pbs-restore-vm"; > const MAX_CID_TRIES: u64 = 32; > +const DYNAMIC_MEMORY_MB: usize = 1024; > +const QMP_SOCKET_PREFIX: &str = "/run/proxmox-backup/file-restore-qmp-"; > > fn create_restore_log_dir() -> Result<String, Error> { > let logpath = format!("{}/file-restore", pbs_buildcfg::PROXMOX_BACKUP_LOG_DIR); > @@ -121,6 +128,54 @@ async fn create_temp_initramfs(ticket: &str, debug: bool) -> Result<(File, Strin > Ok((tmp_file, path)) > } > > +async fn send_qmp_request<T: AsyncBufRead + AsyncWrite + Unpin>( > + stream: &mut T, > + request: &str, > +) -> Result<String, Error> { > + stream.write_all(request.as_bytes()).await?; > + stream.flush().await?; > + let mut buf = String::new(); > + let _ = stream.read_line(&mut buf).await?; > + Ok(buf) > +} > + > +pub async fn set_dynamic_memory(cid: i32, target_memory: Option<usize>) -> Result<(), Error> { > + let target_memory = match target_memory { > + Some(size) if size > DYNAMIC_MEMORY_MB => { > + bail!("cannot set to {}M, maximum is {}M", size, DYNAMIC_MEMORY_MB) > + } > + Some(size) => size, > + None => DYNAMIC_MEMORY_MB, > + }; > + > + let path = format!("{}{}.sock", QMP_SOCKET_PREFIX, cid); > + let mut stream = tokio::io::BufStream::new(tokio::net::UnixStream::connect(path).await?); > + > + let _ = stream.read_line(&mut String::new()).await?; // initial qmp message > + let _ = send_qmp_request(&mut stream, "{\"execute\":\"qmp_capabilities\"}\n").await?; > + > + let request = json!({ > + "execute": "object-add", > + "arguments": { > + "qom-type": "memory-backend-ram", > + "id": "mem0", > + "size": target_memory * 1024 * 1024, > + } > + }); > + let _ = send_qmp_request(&mut stream, &serde_json::to_string(&request)?).await?; > + let request = json!({ > + "execute": "device_add", > + "arguments": { > + "driver": "pc-dimm", > + "id": "dimm0", > + "memdev": "mem0", > + } > + }); > + let _ = send_qmp_request(&mut stream, &serde_json::to_string(&request)?).await?; > + > + Ok(()) > +} > + > pub async fn start_vm( > // u16 so we can do wrapping_add without going too high > mut cid: u16, > @@ -185,7 +240,8 @@ pub async fn start_vm( > &ramfs_path, > "-append", > &format!( > - "{} panic=1 zfs_arc_min=0 zfs_arc_max=0", > + "{} panic=1 zfs_arc_min=0 zfs_arc_max=0 memhp_default_state=online_kernel > +", > if debug { "debug" } else { "quiet" } > ), > "-daemonize", > @@ -252,13 +308,23 @@ pub async fn start_vm( > let mut qemu_cmd = std::process::Command::new("qemu-system-x86_64"); > qemu_cmd.args(base_args.iter()); > qemu_cmd.arg("-m"); > - qemu_cmd.arg(ram.to_string()); > + qemu_cmd.arg(format!( > + "{ram}M,slots=1,maxmem={}M", > + DYNAMIC_MEMORY_MB + ram > + )); > qemu_cmd.args(&drives); > qemu_cmd.arg("-device"); > qemu_cmd.arg(format!( > "vhost-vsock-pci,guest-cid={},disable-legacy=on", > cid > )); > + qemu_cmd.arg("-chardev"); > + qemu_cmd.arg(format!( > + "socket,id=qmp,path={}{}.sock,server=on,wait=off", > + QMP_SOCKET_PREFIX, cid > + )); > + qemu_cmd.arg("-mon"); > + qemu_cmd.arg("chardev=qmp,mode=control"); > > if debug { > let debug_args = [ > -- > 2.30.2 > > > > _______________________________________________ > pbs-devel mailing list > pbs-devel@lists.proxmox.com > https://lists.proxmox.com/cgi-bin/mailman/listinfo/pbs-devel > > ^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: [pbs-devel] applied: [RFC PATCH proxmox-backup 2/2] file-restore: dynamically increase memory of vm for zpools 2022-11-07 12:35 ` [pbs-devel] applied: " Wolfgang Bumiller @ 2022-11-07 17:15 ` Thomas Lamprecht 2022-11-08 7:59 ` Dominik Csapak 0 siblings, 1 reply; 11+ messages in thread From: Thomas Lamprecht @ 2022-11-07 17:15 UTC (permalink / raw) To: Proxmox Backup Server development discussion, Wolfgang Bumiller, Dominik Csapak Am 07/11/2022 um 13:35 schrieb Wolfgang Bumiller: > applied meh, can we please get this opt-in any only enabled it for root@pam or for users with some powerfull priv on / as talked as chosen approach to allow more memory the last time this came up (off list IIRC)... I really do *not* want a memory DOS potential increased by a lot just opening some file-restore tabs, this actually should get more restrictions (for common "non powerfull" users), not less.. > > AFAICT if the kernel patch is not applied it'll simply have no effect > anyway, so we shouldn't need any "hard dependency bumps" where > otherweise things break? if you actually want to enforce that the new behavior is there you need a dependency bump. ^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: [pbs-devel] applied: [RFC PATCH proxmox-backup 2/2] file-restore: dynamically increase memory of vm for zpools 2022-11-07 17:15 ` Thomas Lamprecht @ 2022-11-08 7:59 ` Dominik Csapak 2022-11-08 9:19 ` Thomas Lamprecht 0 siblings, 1 reply; 11+ messages in thread From: Dominik Csapak @ 2022-11-08 7:59 UTC (permalink / raw) To: Thomas Lamprecht, Proxmox Backup Server development discussion, Wolfgang Bumiller On 11/7/22 18:15, Thomas Lamprecht wrote: > Am 07/11/2022 um 13:35 schrieb Wolfgang Bumiller: >> applied > > meh, can we please get this opt-in any only enabled it for root@pam or for users > with some powerfull priv on / as talked as chosen approach to allow more memory the > last time this came up (off list IIRC)... I really do *not* want a memory DOS potential > increased by a lot just opening some file-restore tabs, this actually should get more > restrictions (for common "non powerfull" users), not less.. understandable, so i can do that, but maybe it's time we rethink the file-restore mechanism as a whole, since it's currently rather inergonomic: * users don't know how many and which file restore vms are running, they may not even know it starts a vm at all * regardless with/without my patch, the only thing necessary to start a bunch vms is VM.Backup to the vm and Datastore.AllocateSpace on the storage (which in turn is probably enough to create an arbitrary amount of backups) * having arbitrary sized disks/fs inside, no fixed amount we give will always be enough so here some proposals on how to improve that (we won't implement all of them simultaneously, but maybe something from that list is usable) * make the list of running file-restore vms visible, and maybe add a manual 'shutdown' * limit the amount of restore vms per user (or per vm?) - this would need the mechanism from above anyway, since otherwise either the user cannot start the restore vm or we abort an older vm (with possibly running download operation) * make the vm memory configurable (per user/vm/globally?) * limit the global memory usage for file restore vms - again needs some control mechanism for stopping/seeing these vms * throw away the automatic starting of vms, and make it explicit, i.e. make the user start/shutdown a vm manually - we can have some 'configuration panel' before starting (like with restore) - the user is aware it's starting - still needs some mechanism to see them, but with a manual starting api call it's easier to have e.g. a running worker that can be stopped > >> >> AFAICT if the kernel patch is not applied it'll simply have no effect >> anyway, so we shouldn't need any "hard dependency bumps" where >> otherweise things break? > > if you actually want to enforce that the new behavior is there you need a dependency > bump. ^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: [pbs-devel] applied: [RFC PATCH proxmox-backup 2/2] file-restore: dynamically increase memory of vm for zpools 2022-11-08 7:59 ` Dominik Csapak @ 2022-11-08 9:19 ` Thomas Lamprecht 2022-11-08 10:07 ` Dominik Csapak 0 siblings, 1 reply; 11+ messages in thread From: Thomas Lamprecht @ 2022-11-08 9:19 UTC (permalink / raw) To: Dominik Csapak, Proxmox Backup Server development discussion, Wolfgang Bumiller Am 08/11/2022 um 08:59 schrieb Dominik Csapak: > On 11/7/22 18:15, Thomas Lamprecht wrote: >> Am 07/11/2022 um 13:35 schrieb Wolfgang Bumiller: >>> applied >> >> meh, can we please get this opt-in any only enabled it for root@pam or for users >> with some powerfull priv on / as talked as chosen approach to allow more memory the >> last time this came up (off list IIRC)... I really do *not* want a memory DOS potential >> increased by a lot just opening some file-restore tabs, this actually should get more >> restrictions (for common "non powerfull" users), not less.. > > understandable, so i can do that, but maybe it's time we rethink the file-restore > mechanism as a whole, since it's currently rather inergonomic: IMO the current approach is very ergonomic as the user doesn't has to care at all about details, I'd like to keep it that way as much as possible. While we may need to add some every extra setting or extra work load for users/admins should be kept at a minimum. > > * users don't know how many and which file restore vms are running, they > may not even know it starts a vm at all that isn't an issue, the user wants file restores, not care about managing its details. > * regardless with/without my patch, the only thing necessary to start a bunch vms > is VM.Backup to the vm and Datastore.AllocateSpace on the storage > (which in turn is probably enough to create an arbitrary amount of backups) > * having arbitrary sized disks/fs inside, no fixed amount we give will always > be enough > > so here some proposals on how to improve that (we won't implement all of them > simultaneously, but maybe something from that list is usable) > * make the list of running file-restore vms visible, and maybe add a manual 'shutdown' the VM lives for a minute and can be spawn in milliseconds, so not really seeing the benefit of that, a misguided/bad/... user can much faster spawn such VMs that one can stop them and the syslog already contains basic info. > * limit the amount of restore vms per user (or per vm?) > - this would need the mechanism from above anyway, since otherwise either the user > cannot start the restore vm or we abort an older vm (with possibly running > download operation) which mechanism? The list of running VMs is only required in the backend and we can already find that info out there. > * make the vm memory configurable (per user/vm/globally?) meh, but will be one of the better mechanisms with the priv/unpriv and different limits enforced via a cgroup (see below) > * limit the global memory usage for file restore vms > - again needs some control mechanism for stopping/seeing these vms Not really, just needs a cgroup for limiting all. Unpriv user A really won't be able to stop the file-restore VM of unpriv User B, so this is not really gaining anything - you cannot expect that an admin is around all the time. > * throw away the automatic starting of vms, and make it explicit, i.e. NACK, worsens UX a lot without really a benefit, the VM will still be started in the end, whatever limiting mechanism could be applied without any dialogue anyway.. > make the user start/shutdown a vm manually > - we can have some 'configuration panel' before starting (like with restore) > - the user is aware it's starting > - still needs some mechanism to see them, but with a manual starting api call > it's easier to have e.g. a running worker that can be stopped > As said weeks ago, the simple stop gap for now is to just opt into more possible memory once we got a sufficiently privilege (e.g., VM.Allocate and/or Sys.Modify on / as a starter) for a user, this does away the basic woes now. Sure, a priv and unpriv user may get to "open" the same restore VM, and depending on order it will then be able to use more or less resources, but that won't matter much in practice and the same is true for the debug flag. As small addition, with possibly nice in practice effects: add a "Finish" button to the file-restore window, that actively sends a signal to the VM on press which then will reduce the idle shutdown timer to ~3s (to avoid breaking other currently restores or having that open), that way you don't need extra lists and managements as most of the time it happens indirectly just fine. Additionally we can put all VMs in a cgroup with max mem configured, that really could be a new setting in the node or datacenter.cfg, can easily be nested as in: - root restore CG with a total limit, restores started by priv'd user start here '- user A CG with a per-unpriv-user-limit '- user B CG with a per-unpriv-user-limit That way we would not need to limit on number of restores, which we don't care but the actual resource we care about. Side benefit, could get reduced CPU shares so that scheduling prefers the "real" workload. Another possibility would be to also evaluate the MicroVM machine type to reduce the basic impact of the OS - tbh. I'm not sure if Stefan checked that out. And please note that this all is mostly an artificial problem anyway, ZFS isn't that popular inside of guests, which are the prime target for file-restore, and especially not in form of using a huge set of disks as redundancy et al. is normally covered more centrally on the host, so iff its rather used for snapshots (which lvm-thin or btrfs may be a better option inside guests, as they're in- kernel-tree and one doesn't pay such a (relatively) big memory tax. IIRC the issue even only got reported from internal testing of PVE or the like, iow. odd setups. So please lets keep this simple and avoid breaking UX for all the actual realistic uses cases. So please, make this opt-in at PBS site first, then, opt-in for the respective user class (even just root@pam can be argued if really unsure), else I'll need to revert this for the next bump. ^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: [pbs-devel] applied: [RFC PATCH proxmox-backup 2/2] file-restore: dynamically increase memory of vm for zpools 2022-11-08 9:19 ` Thomas Lamprecht @ 2022-11-08 10:07 ` Dominik Csapak 0 siblings, 0 replies; 11+ messages in thread From: Dominik Csapak @ 2022-11-08 10:07 UTC (permalink / raw) To: Thomas Lamprecht, Proxmox Backup Server development discussion, Wolfgang Bumiller On 11/8/22 10:19, Thomas Lamprecht wrote: > Am 08/11/2022 um 08:59 schrieb Dominik Csapak: >> On 11/7/22 18:15, Thomas Lamprecht wrote: >>> Am 07/11/2022 um 13:35 schrieb Wolfgang Bumiller: >>>> applied >>> >>> meh, can we please get this opt-in any only enabled it for root@pam or for users >>> with some powerfull priv on / as talked as chosen approach to allow more memory the >>> last time this came up (off list IIRC)... I really do *not* want a memory DOS potential >>> increased by a lot just opening some file-restore tabs, this actually should get more >>> restrictions (for common "non powerfull" users), not less.. >> >> understandable, so i can do that, but maybe it's time we rethink the file-restore >> mechanism as a whole, since it's currently rather inergonomic: > > IMO the current approach is very ergonomic as the user doesn't has to care > at all about details, I'd like to keep it that way as much as possible. > While we may need to add some every extra setting or extra work load for users/admins > should be kept at a minimum. understandable > >> >> * users don't know how many and which file restore vms are running, they >> may not even know it starts a vm at all > > that isn't an issue, the user wants file restores, not care about managing > its details. yes, but ... (see my answer below) > >> * regardless with/without my patch, the only thing necessary to start a bunch vms >> is VM.Backup to the vm and Datastore.AllocateSpace on the storage >> (which in turn is probably enough to create an arbitrary amount of backups) >> * having arbitrary sized disks/fs inside, no fixed amount we give will always >> be enough >> >> so here some proposals on how to improve that (we won't implement all of them >> simultaneously, but maybe something from that list is usable) >> * make the list of running file-restore vms visible, and maybe add a manual 'shutdown' > > the VM lives for a minute and can be spawn in milliseconds, so not really seeing > the benefit of that, a misguided/bad/... user can much faster spawn such VMs that > one can stop them and the syslog already contains basic info. just to clarify: the current shutdown timeout is 10 minutes (600s) after the last action (maybe we should reduce that?) anyway, the point of this was mostly intended for the point below: > >> * limit the amount of restore vms per user (or per vm?) >> - this would need the mechanism from above anyway, since otherwise either the user >> cannot start the restore vm or we abort an older vm (with possibly running >> download operation) > > which mechanism? The list of running VMs is only required in the backend > and we can already find that info out there. > if we limit this in any way (regardless if memory/count) the user has a problem when reaching the limit. either we * throw an error on start and the user is unable do the file restore until one (invisible) vm vanishes (without a way of knowing when that will be) * letting the kernel kill/kill manually another vm (imho a no go, since there might be running operations) so to 'fix' that we could show some running file restore list (then the user has control over it) your suggestion with the 'finish' button is also good, but maybe we should generally do that when closing the file restore window? because when the user closes it without 'finishing' there again is no way to know which vm is actually running (also it does not help when the user closes the tab, browser, etc. and also does not do anything for cli users (granted root@pam only)) >> * make the vm memory configurable (per user/vm/globally?) > > meh, but will be one of the better mechanisms with the priv/unpriv and > different limits enforced via a cgroup (see below) > >> * limit the global memory usage for file restore vms >> - again needs some control mechanism for stopping/seeing these vms > > Not really, just needs a cgroup for limiting all. Unpriv user A really > won't be able to stop the file-restore VM of unpriv User B, so this is not > really gaining anything - you cannot expect that an admin is around all > the time. > >> * throw away the automatic starting of vms, and make it explicit, i.e. > > NACK, worsens UX a lot without really a benefit, the VM will still be > started in the end, whatever limiting mechanism could be applied without > any dialogue anyway.. > >> make the user start/shutdown a vm manually >> - we can have some 'configuration panel' before starting (like with restore) >> - the user is aware it's starting >> - still needs some mechanism to see them, but with a manual starting api call >> it's easier to have e.g. a running worker that can be stopped >> > > > As said weeks ago, the simple stop gap for now is to just opt into more possible > memory once we got a sufficiently privilege (e.g., VM.Allocate and/or Sys.Modify > on / as a starter) for a user, this does away the basic woes now. Sure, a priv > and unpriv user may get to "open" the same restore VM, and depending on order > it will then be able to use more or less resources, but that won't matter much > in practice and the same is true for the debug flag. > > As small addition, with possibly nice in practice effects: > add a "Finish" button to the file-restore window, that actively sends a signal > to the VM on press which then will reduce the idle shutdown timer to ~3s (to avoid > breaking other currently restores or having that open), that way you don't need > extra lists and managements as most of the time it happens indirectly just fine. > > Additionally we can put all VMs in a cgroup with max mem configured, that really > could be a new setting in the node or datacenter.cfg, can easily be nested as in: > > - root restore CG with a total limit, restores started by priv'd user start here > '- user A CG with a per-unpriv-user-limit > '- user B CG with a per-unpriv-user-limit > > That way we would not need to limit on number of restores, which we don't care > but the actual resource we care about. Side benefit, could get reduced CPU shares > so that scheduling prefers the "real" workload. > > Another possibility would be to also evaluate the MicroVM machine type to reduce > the basic impact of the OS - tbh. I'm not sure if Stefan checked that out. > > And please note that this all is mostly an artificial problem anyway, ZFS isn't > that popular inside of guests, which are the prime target for file-restore, > and especially not in form of using a huge set of disks as redundancy et al. is > normally covered more centrally on the host, so iff its rather used for snapshots > (which lvm-thin or btrfs may be a better option inside guests, as they're in- > kernel-tree and one doesn't pay such a (relatively) big memory tax. > > IIRC the issue even only got reported from internal testing of PVE or the like, > iow. odd setups. So please lets keep this simple and avoid breaking UX for all > the actual realistic uses cases. > > So please, make this opt-in at PBS site first, then, opt-in for the respective > user class (even just root@pam can be argued if really unsure), else I'll need > to revert this for the next bump. yes, besides my remarks above seems like a sensible solution. just to clarify that i understood you correctly i'd do the following now: * add a flag to file-restore that enables the increasing memory behaviour * check the user privs in the pve-storage api and automatically enable that flag later we can introduce the cgroup mechanism ^ permalink raw reply [flat|nested] 11+ messages in thread
end of thread, other threads:[~2022-11-08 10:07 UTC | newest] Thread overview: 11+ messages (download: mbox.gz / follow: Atom feed) -- links below jump to the message on this page -- 2022-10-31 11:39 [pbs-devel] [RFC PATCH backup-restore-image/proxmox-backup] improve file-restore for zpools Dominik Csapak 2022-10-31 11:39 ` [pbs-devel] [RFC PATCH backup-restore-image 1/1] add kernel config options for memory hotplug Dominik Csapak 2022-11-07 12:48 ` [pbs-devel] applied: " Wolfgang Bumiller 2022-10-31 11:39 ` [pbs-devel] [PATCH proxmox-backup 1/2] file-restore: fix deprecated qemu parameters Dominik Csapak 2022-11-04 12:29 ` [pbs-devel] applied: " Wolfgang Bumiller 2022-10-31 11:39 ` [pbs-devel] [RFC PATCH proxmox-backup 2/2] file-restore: dynamically increase memory of vm for zpools Dominik Csapak 2022-11-07 12:35 ` [pbs-devel] applied: " Wolfgang Bumiller 2022-11-07 17:15 ` Thomas Lamprecht 2022-11-08 7:59 ` Dominik Csapak 2022-11-08 9:19 ` Thomas Lamprecht 2022-11-08 10:07 ` Dominik Csapak
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox