From mboxrd@z Thu Jan  1 00:00:00 1970
Return-Path: <t.lamprecht@proxmox.com>
Received: from firstgate.proxmox.com (firstgate.proxmox.com [212.224.123.68])
 (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits)
 key-exchange X25519 server-signature RSA-PSS (2048 bits))
 (No client certificate requested)
 by lists.proxmox.com (Postfix) with ESMTPS id F14CF8D4D4
 for <pbs-devel@lists.proxmox.com>; Tue,  8 Nov 2022 10:19:24 +0100 (CET)
Received: from firstgate.proxmox.com (localhost [127.0.0.1])
 by firstgate.proxmox.com (Proxmox) with ESMTP id C66F84F72
 for <pbs-devel@lists.proxmox.com>; Tue,  8 Nov 2022 10:19:24 +0100 (CET)
Received: from proxmox-new.maurer-it.com (proxmox-new.maurer-it.com
 [94.136.29.106])
 (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits)
 key-exchange X25519 server-signature RSA-PSS (2048 bits) server-digest SHA256)
 (No client certificate requested)
 by firstgate.proxmox.com (Proxmox) with ESMTPS
 for <pbs-devel@lists.proxmox.com>; Tue,  8 Nov 2022 10:19:23 +0100 (CET)
Received: from proxmox-new.maurer-it.com (localhost.localdomain [127.0.0.1])
 by proxmox-new.maurer-it.com (Proxmox) with ESMTP id 795AF42E24
 for <pbs-devel@lists.proxmox.com>; Tue,  8 Nov 2022 10:19:21 +0100 (CET)
Message-ID: <7d56a05f-be11-586c-f5af-f6fbd2cd2d4e@proxmox.com>
Date: Tue, 8 Nov 2022 10:19:20 +0100
MIME-Version: 1.0
User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:107.0) Gecko/20100101
 Thunderbird/107.0
Content-Language: en-GB
To: Dominik Csapak <d.csapak@proxmox.com>,
 Proxmox Backup Server development discussion <pbs-devel@lists.proxmox.com>,
 Wolfgang Bumiller <w.bumiller@proxmox.com>
References: <20221031113953.3111599-1-d.csapak@proxmox.com>
 <20221031113953.3111599-4-d.csapak@proxmox.com>
 <20221107123538.i355vavyncl5edm7@casey.proxmox.com>
 <36968ac0-73c1-466a-e122-485a2598d14b@proxmox.com>
 <a4b48b1d-530e-8e14-ddef-bb27ff5b325b@proxmox.com>
From: Thomas Lamprecht <t.lamprecht@proxmox.com>
In-Reply-To: <a4b48b1d-530e-8e14-ddef-bb27ff5b325b@proxmox.com>
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: quoted-printable
X-SPAM-LEVEL: Spam detection results:  0
 AWL -0.033 Adjusted score from AWL reputation of From: address
 BAYES_00                 -1.9 Bayes spam probability is 0 to 1%
 KAM_DMARC_STATUS 0.01 Test Rule for DKIM or SPF Failure with Strict Alignment
 NICE_REPLY_A           -0.001 Looks like a legit reply (A)
 SPF_HELO_NONE           0.001 SPF: HELO does not publish an SPF Record
 SPF_PASS               -0.001 SPF: sender matches SPF record
Subject: Re: [pbs-devel] applied: [RFC PATCH proxmox-backup 2/2]
 file-restore: dynamically increase memory of vm for zpools
X-BeenThere: pbs-devel@lists.proxmox.com
X-Mailman-Version: 2.1.29
Precedence: list
List-Id: Proxmox Backup Server development discussion
 <pbs-devel.lists.proxmox.com>
List-Unsubscribe: <https://lists.proxmox.com/cgi-bin/mailman/options/pbs-devel>, 
 <mailto:pbs-devel-request@lists.proxmox.com?subject=unsubscribe>
List-Archive: <http://lists.proxmox.com/pipermail/pbs-devel/>
List-Post: <mailto:pbs-devel@lists.proxmox.com>
List-Help: <mailto:pbs-devel-request@lists.proxmox.com?subject=help>
List-Subscribe: <https://lists.proxmox.com/cgi-bin/mailman/listinfo/pbs-devel>, 
 <mailto:pbs-devel-request@lists.proxmox.com?subject=subscribe>
X-List-Received-Date: Tue, 08 Nov 2022 09:19:25 -0000

Am 08/11/2022 um 08:59 schrieb Dominik Csapak:
> On 11/7/22 18:15, Thomas Lamprecht wrote:
>> Am 07/11/2022 um 13:35 schrieb Wolfgang Bumiller:
>>> applied
>>
>> meh, can we please get this opt-in any only enabled it for root@pam or=
 for users
>> with some powerfull priv on / as talked as chosen approach to allow mo=
re memory the
>> last time this came up (off list IIRC)... I really do *not* want a mem=
ory DOS potential
>> increased by a lot just opening some file-restore tabs, this actually =
should get more
>> restrictions (for common "non powerfull" users), not less..
>=20
> understandable, so i can do that, but maybe it's time we rethink the fi=
le-restore
> mechanism as a whole, since it's currently rather inergonomic:

IMO the current approach is very ergonomic as the user doesn't has to car=
e
at all about details, I'd like to keep it that way as much as possible.
While we may need to add some every extra setting or extra work load for =
users/admins
should be kept at a minimum.

>=20
> * users don't know how many and which file restore vms are running, the=
y
> may not even know it starts a vm at all

that isn't an issue, the user wants file restores, not care about managin=
g
its details.

> * regardless with/without my patch, the only thing necessary to start a=
 bunch vms
> is VM.Backup to the vm and Datastore.AllocateSpace on the storage
> (which in turn is probably enough to create an arbitrary amount of back=
ups)
> * having arbitrary sized disks/fs inside, no fixed amount we give will =
always
> be enough
>=20
> so here some proposals on how to improve that (we won't implement all o=
f them
> simultaneously, but maybe something from that list is usable)
> * make the list of running file-restore vms visible, and maybe add a ma=
nual 'shutdown'

the VM lives for a minute and can be spawn in milliseconds, so not really=
 seeing
the benefit of that, a misguided/bad/... user can much faster spawn such =
VMs that
one can stop them and the syslog already contains basic info.

> * limit the amount of restore vms per user (or per vm?)
> =C2=A0 - this would need the mechanism from above anyway, since otherwi=
se either the user
> =C2=A0=C2=A0=C2=A0 cannot start the restore vm or we abort an older vm =
(with possibly running
> =C2=A0=C2=A0=C2=A0 download operation)

which mechanism? The list of running VMs is only required in the backend
and we can already find that info out there.

> * make the vm memory configurable (per user/vm/globally?)

meh, but will be one of the better mechanisms with the priv/unpriv and
different limits enforced via a cgroup (see below)

> * limit the global memory usage for file restore vms
> =C2=A0 - again needs some control mechanism for stopping/seeing these v=
ms

Not really, just needs a cgroup for limiting all. Unpriv user A really
won't be able to stop the file-restore VM of unpriv User B, so this is no=
t
really gaining anything - you cannot expect that an admin is around all
the time.

> * throw away the automatic starting of vms, and make it explicit, i.e.

NACK, worsens UX a lot without really a benefit, the VM will still be
started in the end, whatever limiting mechanism could be applied without
any dialogue anyway..

> =C2=A0 make the user start/shutdown a vm manually
> =C2=A0 - we can have some 'configuration panel' before starting (like w=
ith restore)
> =C2=A0 - the user is aware it's starting
> =C2=A0 - still needs some mechanism to see them, but with a manual star=
ting api call
> =C2=A0=C2=A0=C2=A0 it's easier to have e.g. a running worker that can b=
e stopped
>=20


As said weeks ago, the simple stop gap for now is to just opt into more p=
ossible
memory once we got a sufficiently privilege (e.g., VM.Allocate and/or Sys=
=2EModify
on / as a starter) for a user, this does away the basic woes now. Sure, a=
 priv
and unpriv user may get to "open" the same restore VM, and depending on o=
rder
it will then be able to use more or less resources, but that won't matter=
 much
in practice and the same is true for the debug flag.

As small addition, with possibly nice in practice effects:=20
add a "Finish" button  to the file-restore window, that actively sends a =
signal
to the VM on press which then will reduce the idle shutdown timer to ~3s =
(to avoid
breaking other currently restores or having that open), that way you don'=
t need
extra lists and managements as most of the time it happens indirectly jus=
t fine.

Additionally we can put all VMs in a cgroup with max mem configured, that=
 really
could be a new setting in the node or datacenter.cfg, can easily be neste=
d as in:

- root restore CG with a total limit, restores started by priv'd user sta=
rt here
 '- user A CG with a per-unpriv-user-limit
 '- user B CG with a per-unpriv-user-limit

That way we would not need to limit on number of restores, which we don't=
 care
but the actual resource we care about. Side benefit, could get reduced CP=
U shares
so that scheduling prefers the "real" workload.

Another possibility would be to also evaluate the MicroVM machine type to=
 reduce
the basic impact of the OS - tbh. I'm not sure if Stefan checked that out=
=2E

And please note that this all is mostly an artificial problem anyway, ZFS=
 isn't
that popular inside of guests, which are the prime target for file-restor=
e,
and especially not in form of using a huge set of disks as redundancy et =
al. is
normally covered more centrally on the host, so iff its rather used for s=
napshots
(which lvm-thin or btrfs may be a better option inside guests, as they're=
 in-
kernel-tree and one doesn't pay such a (relatively) big memory tax.

IIRC the issue even only got reported from internal testing of PVE or the=
 like,
iow. odd setups. So please lets keep this simple and avoid breaking UX fo=
r all
the actual realistic uses cases.

So please, make this opt-in at PBS site first, then, opt-in for the respe=
ctive
user class (even just root@pam can be argued if really unsure), else I'll=
 need
to revert this for the next bump.