From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from firstgate.proxmox.com (firstgate.proxmox.com [212.224.123.68]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature RSA-PSS (2048 bits)) (No client certificate requested) by lists.proxmox.com (Postfix) with ESMTPS id 2F36795112 for ; Tue, 17 Jan 2023 13:17:03 +0100 (CET) Received: from firstgate.proxmox.com (localhost [127.0.0.1]) by firstgate.proxmox.com (Proxmox) with ESMTP id 06D18B4EA for ; Tue, 17 Jan 2023 13:16:33 +0100 (CET) Received: from proxmox-new.maurer-it.com (proxmox-new.maurer-it.com [94.136.29.106]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature RSA-PSS (2048 bits) server-digest SHA256) (No client certificate requested) by firstgate.proxmox.com (Proxmox) with ESMTPS for ; Tue, 17 Jan 2023 13:16:31 +0100 (CET) Received: from proxmox-new.maurer-it.com (localhost.localdomain [127.0.0.1]) by proxmox-new.maurer-it.com (Proxmox) with ESMTP id A01FF44CB5 for ; Tue, 17 Jan 2023 13:16:31 +0100 (CET) Date: Tue, 17 Jan 2023 13:16:20 +0100 From: Fabian =?iso-8859-1?q?Gr=FCnbichler?= To: Proxmox Backup Server development discussion References: <20230111145210.516392-1-s.hanreich@proxmox.com> <20230111145210.516392-2-s.hanreich@proxmox.com> In-Reply-To: <20230111145210.516392-2-s.hanreich@proxmox.com> MIME-Version: 1.0 User-Agent: astroid/0.16.0 (https://github.com/astroidmail/astroid) Message-Id: <1673953222.69xzeyon62.astroid@yuna.none> Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: quoted-printable X-SPAM-LEVEL: Spam detection results: 0 AWL 0.135 Adjusted score from AWL reputation of From: address BAYES_00 -1.9 Bayes spam probability is 0 to 1% KAM_DMARC_STATUS 0.01 Test Rule for DKIM or SPF Failure with Strict Alignment SPF_HELO_NONE 0.001 SPF: HELO does not publish an SPF Record SPF_PASS -0.001 SPF: sender matches SPF record URIBL_BLOCKED 0.001 ADMINISTRATOR NOTICE: The query to URIBL was blocked. See http://wiki.apache.org/spamassassin/DnsBlocklists#dnsbl-block for more information. [jobs.rs, pull.rs, proxmox-backup-manager.rs, sync.rs] Subject: Re: [pbs-devel] [PATCH proxmox-backup v2 1/2] partial fix #3701: sync/pull: add transfer-last parameter X-BeenThere: pbs-devel@lists.proxmox.com X-Mailman-Version: 2.1.29 Precedence: list List-Id: Proxmox Backup Server development discussion List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Tue, 17 Jan 2023 12:17:03 -0000 On January 11, 2023 3:52 pm, Stefan Hanreich wrote: > Specifying the transfer-last parameter limits the amount of backups > that get synced via the pull command/sync job. The parameter specifies > how many of the N latest backups should get pulled/synced. All other > backups will get skipped. >=20 > This is particularly useful in situations where the sync target has > less disk space than the source. Syncing all backups from the source > is not possible if there is not enough disk space on the target. > Additionally this can be used for limiting the amount of data > transferred, reducing load on the network. >=20 > Signed-off-by: Stefan Hanreich > --- > I had to make slight adjustments to Wolfgang's proposed condition because= it > wouldn't work in cases where transfer-last was greater than the total amo= unt of > backups available. > Nevertheless, the condition should now be a lot less obtuse and easier to= read. see below for a suggestion for further improvement one high-level comment: I am not sure about the interaction between re-sync= ing the last synced snapshot and transfer-last. I think I'd prefer the last syn= ced snapshot to be always re-synced (the main purpose is to get the backup log/= notes in case backup and pull/sync align just right, so not much data should be transferred/used), which is not happening at the moment (but trivially implemented, just needs an additional condition in the transfer-last check.= .). >=20 > pbs-api-types/src/jobs.rs | 11 +++++++++++ > src/api2/config/sync.rs | 9 +++++++++ > src/api2/pull.rs | 10 +++++++++- > src/bin/proxmox-backup-manager.rs | 11 ++++++++++- > src/server/pull.rs | 17 ++++++++++++++++- > 5 files changed, 55 insertions(+), 3 deletions(-) >=20 [..] > diff --git a/src/server/pull.rs b/src/server/pull.rs > index 65eedf2c..81f4faf3 100644 > --- a/src/server/pull.rs > +++ b/src/server/pull.rs > @@ -1,5 +1,6 @@ > //! Sync datastore from remote server > =20 > +use std::cmp::min; > use std::collections::{HashMap, HashSet}; > use std::io::{Seek, SeekFrom}; > use std::sync::atomic::{AtomicUsize, Ordering}; > @@ -59,6 +60,8 @@ pub(crate) struct PullParameters { > group_filter: Option>, > /// Rate limits for all transfers from `remote` > limit: RateLimitConfig, > + /// How many snapshots should be transferred at most (taking the new= est N snapshots) > + transfer_last: Option, > } > =20 > impl PullParameters { > @@ -78,6 +81,7 @@ impl PullParameters { > max_depth: Option, > group_filter: Option>, > limit: RateLimitConfig, > + transfer_last: Option, > ) -> Result { > let store =3D DataStore::lookup_datastore(store, Some(Operation:= :Write))?; > =20 > @@ -109,6 +113,7 @@ impl PullParameters { > max_depth, > group_filter, > limit, > + transfer_last, > }) > } > =20 > @@ -573,7 +578,7 @@ impl std::fmt::Display for SkipInfo { > fn fmt(&self, f: &mut std::fmt::Formatter<'_>) -> std::fmt::Result { > write!( > f, > - "skipped: {} snapshot(s) ({}) older than the newest local sn= apshot", > + "skipped: {} snapshot(s) ({}) - older than the newest local = snapshot or excluded by transfer-last", it would possibly be nicer to store a "reason" in SkipInfo, and count the "already synced" and "excluded by transfer-last" snapshots separately (if o= nly so that an admin can verify that transfer-last has the desired effect). > self.count, > self.affected().map_err(|_| std::fmt::Error)? > ) > @@ -646,6 +651,11 @@ async fn pull_group( > count: 0, > }; > =20 > + let total_amount =3D list.len(); > + > + let mut transfer_amount =3D params.transfer_last.unwrap_or(total_amo= unt); > + transfer_amount =3D min(transfer_amount, total_amount); I'd prefer this to just calculate the cutoff for the check below, e.g. let XXX =3D params .transfer_last .map(|count| total_amount.saturating_sub(count)) .unwrap_or_default(); (or an equivalent match construct, or some variant of your unwrap+min const= ruct - doesn't really matter as long as it treats underflows correctly) > + > for (pos, item) in list.into_iter().enumerate() { > let snapshot =3D item.backup; > =20 > @@ -668,6 +678,11 @@ async fn pull_group( > } > } > =20 > + if pos < (total_amount - transfer_amount) { because this screams "underflow" to me (even though that is checked above i= n the code in your variant, there might be more stuff added in-between and once i= t's no longer on the same screen this adds cognitive overhead ;)), whereas if pos < cutoff { .. } makes it clear that there is no chance of a bad underflow occuring *here* > + skip_info.update(snapshot.time); > + continue; > + } > + > // get updated auth_info (new tickets) > let auth_info =3D client.login().await?; one more related thing that might have room for improvement: the progress info counts snapshots skipped cause of transfer-last as "done"= , but the order of logging is: - print re-sync (if applicable) - print pulled snapshot progress - print info about skipped snapshots (if applicable) it might be better (with SkipInfo split) to print - print info about skipped < last_sync - print re-sync (see below though) - print info about skipped transfer-last - print pull snapshot progress ("done" can now include skipped snapshots wi= thout confusion) e.g., an example of the status quo (one snapshot synced, tansfer-last =3D 2= , a total of 8 remote snapshots): 2023-01-17T12:49:46+01:00: percentage done: 33.33% (1/3 groups) 2023-01-17T12:49:46+01:00: skipped: 2 snapshot(s) (2022-12-13T11:09:43Z .. = 2022-12-16T09:44:34Z) - older than the newest local snapshot or excluded by= transfer-last 2023-01-17T12:49:46+01:00: sync snapshot vm/104/2023-01-16T13:09:55Z 2023-01-17T12:49:46+01:00: sync archive qemu-server.conf.blob 2023-01-17T12:49:46+01:00: sync archive drive-scsi0.img.fidx 2023-01-17T12:49:46+01:00: downloaded 0 bytes (0.00 MiB/s) 2023-01-17T12:49:46+01:00: got backup log file "client.log.blob" 2023-01-17T12:49:46+01:00: sync snapshot vm/104/2023-01-16T13:09:55Z done 2023-01-17T12:49:46+01:00: percentage done: 62.50% (1/3 groups, 7/8 snapsho= ts in group #2) 2023-01-17T12:49:46+01:00: sync snapshot vm/104/2023-01-16T13:24:58Z 2023-01-17T12:49:46+01:00: sync archive qemu-server.conf.blob 2023-01-17T12:49:46+01:00: sync archive drive-scsi0.img.fidx 2023-01-17T12:49:46+01:00: downloaded 0 bytes (0.00 MiB/s) 2023-01-17T12:49:46+01:00: got backup log file "client.log.blob" 2023-01-17T12:49:46+01:00: sync snapshot vm/104/2023-01-16T13:24:58Z done 2023-01-17T12:49:46+01:00: percentage done: 66.67% (2/3 groups) 2023-01-17T12:49:46+01:00: skipped: 6 snapshot(s) (2022-12-16T09:25:03Z .. 2023-01-12T09:43:15Z) - older than the newest local snapshot or excluded by= transfer-last if we revamp the progress display, it might also make sense to improve the output for groups that are already 100% synced: 2023-01-17T12:49:46+01:00: re-sync snapshot vm/800/2023-01-16T12:28:10Z 2023-01-17T12:49:46+01:00: no data changes 2023-01-17T12:49:46+01:00: re-sync snapshot vm/800/2023-01-16T12:28:10Z don= e 2023-01-17T12:49:46+01:00: percentage done: 100.00% (3/3 groups) IMHO we could remove the "re-sync .. done" line (the next line will be a progress line anyway, either for this snapshot or the whole group!), and in= stead add an opening line ("sync group XXX..") that helps when scanning the log.