From mboxrd@z Thu Jan  1 00:00:00 1970
Return-Path: <f.gruenbichler@proxmox.com>
Received: from firstgate.proxmox.com (firstgate.proxmox.com [212.224.123.68])
 (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits)
 key-exchange X25519 server-signature RSA-PSS (2048 bits))
 (No client certificate requested)
 by lists.proxmox.com (Postfix) with ESMTPS id 2F36795112
 for <pbs-devel@lists.proxmox.com>; Tue, 17 Jan 2023 13:17:03 +0100 (CET)
Received: from firstgate.proxmox.com (localhost [127.0.0.1])
 by firstgate.proxmox.com (Proxmox) with ESMTP id 06D18B4EA
 for <pbs-devel@lists.proxmox.com>; Tue, 17 Jan 2023 13:16:33 +0100 (CET)
Received: from proxmox-new.maurer-it.com (proxmox-new.maurer-it.com
 [94.136.29.106])
 (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits)
 key-exchange X25519 server-signature RSA-PSS (2048 bits) server-digest SHA256)
 (No client certificate requested)
 by firstgate.proxmox.com (Proxmox) with ESMTPS
 for <pbs-devel@lists.proxmox.com>; Tue, 17 Jan 2023 13:16:31 +0100 (CET)
Received: from proxmox-new.maurer-it.com (localhost.localdomain [127.0.0.1])
 by proxmox-new.maurer-it.com (Proxmox) with ESMTP id A01FF44CB5
 for <pbs-devel@lists.proxmox.com>; Tue, 17 Jan 2023 13:16:31 +0100 (CET)
Date: Tue, 17 Jan 2023 13:16:20 +0100
From: Fabian =?iso-8859-1?q?Gr=FCnbichler?= <f.gruenbichler@proxmox.com>
To: Proxmox Backup Server development discussion <pbs-devel@lists.proxmox.com>
References: <20230111145210.516392-1-s.hanreich@proxmox.com>
 <20230111145210.516392-2-s.hanreich@proxmox.com>
In-Reply-To: <20230111145210.516392-2-s.hanreich@proxmox.com>
MIME-Version: 1.0
User-Agent: astroid/0.16.0 (https://github.com/astroidmail/astroid)
Message-Id: <1673953222.69xzeyon62.astroid@yuna.none>
Content-Type: text/plain; charset=utf-8
Content-Transfer-Encoding: quoted-printable
X-SPAM-LEVEL: Spam detection results:  0
 AWL 0.135 Adjusted score from AWL reputation of From: address
 BAYES_00                 -1.9 Bayes spam probability is 0 to 1%
 KAM_DMARC_STATUS 0.01 Test Rule for DKIM or SPF Failure with Strict Alignment
 SPF_HELO_NONE           0.001 SPF: HELO does not publish an SPF Record
 SPF_PASS               -0.001 SPF: sender matches SPF record
 URIBL_BLOCKED 0.001 ADMINISTRATOR NOTICE: The query to URIBL was blocked. See
 http://wiki.apache.org/spamassassin/DnsBlocklists#dnsbl-block for more
 information. [jobs.rs, pull.rs, proxmox-backup-manager.rs, sync.rs]
Subject: Re: [pbs-devel] [PATCH proxmox-backup v2 1/2] partial fix #3701:
 sync/pull: add transfer-last parameter
X-BeenThere: pbs-devel@lists.proxmox.com
X-Mailman-Version: 2.1.29
Precedence: list
List-Id: Proxmox Backup Server development discussion
 <pbs-devel.lists.proxmox.com>
List-Unsubscribe: <https://lists.proxmox.com/cgi-bin/mailman/options/pbs-devel>, 
 <mailto:pbs-devel-request@lists.proxmox.com?subject=unsubscribe>
List-Archive: <http://lists.proxmox.com/pipermail/pbs-devel/>
List-Post: <mailto:pbs-devel@lists.proxmox.com>
List-Help: <mailto:pbs-devel-request@lists.proxmox.com?subject=help>
List-Subscribe: <https://lists.proxmox.com/cgi-bin/mailman/listinfo/pbs-devel>, 
 <mailto:pbs-devel-request@lists.proxmox.com?subject=subscribe>
X-List-Received-Date: Tue, 17 Jan 2023 12:17:03 -0000

On January 11, 2023 3:52 pm, Stefan Hanreich wrote:
> Specifying the transfer-last parameter limits the amount of backups
> that get synced via the pull command/sync job. The parameter specifies
> how many of the N latest backups should get pulled/synced. All other
> backups will get skipped.
>=20
> This is particularly useful in situations where the sync target has
> less disk space than the source. Syncing all backups from the source
> is not possible if there is not enough disk space on the target.
> Additionally this can be used for limiting the amount of data
> transferred, reducing load on the network.
>=20
> Signed-off-by: Stefan Hanreich <s.hanreich@proxmox.com>
> ---
> I had to make slight adjustments to Wolfgang's proposed condition because=
 it
> wouldn't work in cases where transfer-last was greater than the total amo=
unt of
> backups available.
> Nevertheless, the condition should now be a lot less obtuse and easier to=
 read.

see below for a suggestion for further improvement

one high-level comment: I am not sure about the interaction between re-sync=
ing
the last synced snapshot and transfer-last. I think I'd prefer the last syn=
ced
snapshot to be always re-synced (the main purpose is to get the backup log/=
notes
in case backup and pull/sync align just right, so not much data should be
transferred/used), which is not happening at the moment (but trivially
implemented, just needs an additional condition in the transfer-last check.=
.).

>=20
>  pbs-api-types/src/jobs.rs         | 11 +++++++++++
>  src/api2/config/sync.rs           |  9 +++++++++
>  src/api2/pull.rs                  | 10 +++++++++-
>  src/bin/proxmox-backup-manager.rs | 11 ++++++++++-
>  src/server/pull.rs                | 17 ++++++++++++++++-
>  5 files changed, 55 insertions(+), 3 deletions(-)
>=20

[..]

> diff --git a/src/server/pull.rs b/src/server/pull.rs
> index 65eedf2c..81f4faf3 100644
> --- a/src/server/pull.rs
> +++ b/src/server/pull.rs
> @@ -1,5 +1,6 @@
>  //! Sync datastore from remote server
> =20
> +use std::cmp::min;
>  use std::collections::{HashMap, HashSet};
>  use std::io::{Seek, SeekFrom};
>  use std::sync::atomic::{AtomicUsize, Ordering};
> @@ -59,6 +60,8 @@ pub(crate) struct PullParameters {
>      group_filter: Option<Vec<GroupFilter>>,
>      /// Rate limits for all transfers from `remote`
>      limit: RateLimitConfig,
> +    /// How many snapshots should be transferred at most (taking the new=
est N snapshots)
> +    transfer_last: Option<usize>,
>  }
> =20
>  impl PullParameters {
> @@ -78,6 +81,7 @@ impl PullParameters {
>          max_depth: Option<usize>,
>          group_filter: Option<Vec<GroupFilter>>,
>          limit: RateLimitConfig,
> +        transfer_last: Option<usize>,
>      ) -> Result<Self, Error> {
>          let store =3D DataStore::lookup_datastore(store, Some(Operation:=
:Write))?;
> =20
> @@ -109,6 +113,7 @@ impl PullParameters {
>              max_depth,
>              group_filter,
>              limit,
> +            transfer_last,
>          })
>      }
> =20
> @@ -573,7 +578,7 @@ impl std::fmt::Display for SkipInfo {
>      fn fmt(&self, f: &mut std::fmt::Formatter<'_>) -> std::fmt::Result {
>          write!(
>              f,
> -            "skipped: {} snapshot(s) ({}) older than the newest local sn=
apshot",
> +            "skipped: {} snapshot(s) ({}) - older than the newest local =
snapshot or excluded by transfer-last",

it would possibly be nicer to store a "reason" in SkipInfo, and count the
"already synced" and "excluded by transfer-last" snapshots separately (if o=
nly
so that an admin can verify that transfer-last has the desired effect).

>              self.count,
>              self.affected().map_err(|_| std::fmt::Error)?
>          )
> @@ -646,6 +651,11 @@ async fn pull_group(
>          count: 0,
>      };
> =20
> +    let total_amount =3D list.len();
> +
> +    let mut transfer_amount =3D params.transfer_last.unwrap_or(total_amo=
unt);
> +    transfer_amount =3D min(transfer_amount, total_amount);

I'd prefer this to just calculate the cutoff for the check below, e.g.

let XXX =3D params
        .transfer_last
        .map(|count| total_amount.saturating_sub(count))
        .unwrap_or_default();

(or an equivalent match construct, or some variant of your unwrap+min const=
ruct
- doesn't really matter as long as it treats underflows correctly)

> +
>      for (pos, item) in list.into_iter().enumerate() {
>          let snapshot =3D item.backup;
> =20
> @@ -668,6 +678,11 @@ async fn pull_group(
>              }
>          }
> =20
> +        if pos < (total_amount - transfer_amount) {

because this screams "underflow" to me (even though that is checked above i=
n the
code in your variant, there might be more stuff added in-between and once i=
t's
no longer on the same screen this adds cognitive overhead ;)), whereas

if pos < cutoff {
    ..
}

makes it clear that there is no chance of a bad underflow occuring *here*

> +            skip_info.update(snapshot.time);
> +            continue;
> +        }
> +
>          // get updated auth_info (new tickets)
>          let auth_info =3D client.login().await?;


one more related thing that might have room for improvement:

the progress info counts snapshots skipped cause of transfer-last as "done"=
, but
the order of logging is:
- print re-sync (if applicable)
- print pulled snapshot progress
- print info about skipped snapshots (if applicable)

it might be better (with SkipInfo split) to print
- print info about skipped < last_sync
- print re-sync (see below though)
- print info about skipped transfer-last
- print pull snapshot progress ("done" can now include skipped snapshots wi=
thout
confusion)

e.g., an example of the status quo (one snapshot synced, tansfer-last =3D 2=
, a
total of 8 remote snapshots):

2023-01-17T12:49:46+01:00: percentage done: 33.33% (1/3 groups)
2023-01-17T12:49:46+01:00: skipped: 2 snapshot(s) (2022-12-13T11:09:43Z .. =
2022-12-16T09:44:34Z) - older than the newest local snapshot or excluded by=
 transfer-last
2023-01-17T12:49:46+01:00: sync snapshot vm/104/2023-01-16T13:09:55Z
2023-01-17T12:49:46+01:00: sync archive qemu-server.conf.blob
2023-01-17T12:49:46+01:00: sync archive drive-scsi0.img.fidx
2023-01-17T12:49:46+01:00: downloaded 0 bytes (0.00 MiB/s)
2023-01-17T12:49:46+01:00: got backup log file "client.log.blob"
2023-01-17T12:49:46+01:00: sync snapshot vm/104/2023-01-16T13:09:55Z done
2023-01-17T12:49:46+01:00: percentage done: 62.50% (1/3 groups, 7/8 snapsho=
ts in group #2)
2023-01-17T12:49:46+01:00: sync snapshot vm/104/2023-01-16T13:24:58Z
2023-01-17T12:49:46+01:00: sync archive qemu-server.conf.blob
2023-01-17T12:49:46+01:00: sync archive drive-scsi0.img.fidx
2023-01-17T12:49:46+01:00: downloaded 0 bytes (0.00 MiB/s)
2023-01-17T12:49:46+01:00: got backup log file "client.log.blob"
2023-01-17T12:49:46+01:00: sync snapshot vm/104/2023-01-16T13:24:58Z done
2023-01-17T12:49:46+01:00: percentage done: 66.67% (2/3 groups)
2023-01-17T12:49:46+01:00: skipped: 6 snapshot(s) (2022-12-16T09:25:03Z ..
2023-01-12T09:43:15Z) - older than the newest local snapshot or excluded by=
 transfer-last


if we revamp the progress display, it might also make sense to improve the
output for groups that are already 100% synced:

2023-01-17T12:49:46+01:00: re-sync snapshot vm/800/2023-01-16T12:28:10Z
2023-01-17T12:49:46+01:00: no data changes
2023-01-17T12:49:46+01:00: re-sync snapshot vm/800/2023-01-16T12:28:10Z don=
e
2023-01-17T12:49:46+01:00: percentage done: 100.00% (3/3 groups)

IMHO we could remove the "re-sync .. done" line (the next line will be a
progress line anyway, either for this snapshot or the whole group!), and in=
stead
add an opening line ("sync group XXX..") that helps when scanning the log.