From mboxrd@z Thu Jan  1 00:00:00 1970
Return-Path: <t.lamprecht@proxmox.com>
Received: from firstgate.proxmox.com (firstgate.proxmox.com [212.224.123.68])
 (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits)
 key-exchange X25519 server-signature RSA-PSS (2048 bits))
 (No client certificate requested)
 by lists.proxmox.com (Postfix) with ESMTPS id 27FCC6B816;
 Wed, 17 Mar 2021 14:59:08 +0100 (CET)
Received: from firstgate.proxmox.com (localhost [127.0.0.1])
 by firstgate.proxmox.com (Proxmox) with ESMTP id 1C1782E416;
 Wed, 17 Mar 2021 14:59:08 +0100 (CET)
Received: from proxmox-new.maurer-it.com (proxmox-new.maurer-it.com
 [212.186.127.180])
 (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits)
 key-exchange X25519 server-signature RSA-PSS (2048 bits))
 (No client certificate requested)
 by firstgate.proxmox.com (Proxmox) with ESMTPS id 62D6D2E40C;
 Wed, 17 Mar 2021 14:59:07 +0100 (CET)
Received: from proxmox-new.maurer-it.com (localhost.localdomain [127.0.0.1])
 by proxmox-new.maurer-it.com (Proxmox) with ESMTP id 243F744730;
 Wed, 17 Mar 2021 14:59:07 +0100 (CET)
Message-ID: <9177e016-7e42-cee7-f948-887af087311c@proxmox.com>
Date: Wed, 17 Mar 2021 14:59:05 +0100
MIME-Version: 1.0
User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:87.0) Gecko/20100101
 Thunderbird/87.0
Content-Language: en-US
To: Stefan Reiter <s.reiter@proxmox.com>,
 Proxmox VE development discussion <pve-devel@lists.proxmox.com>,
 pbs-devel@lists.proxmox.com
References: <20210303095612.7475-1-s.reiter@proxmox.com>
 <20210303095612.7475-6-s.reiter@proxmox.com>
 <f3df01a9-71a6-9b20-dafa-3cdda78f2e72@proxmox.com>
 <570fbf9f-988c-c3a7-1475-ff0406ca590e@proxmox.com>
From: Thomas Lamprecht <t.lamprecht@proxmox.com>
In-Reply-To: <570fbf9f-988c-c3a7-1475-ff0406ca590e@proxmox.com>
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: quoted-printable
X-SPAM-LEVEL: Spam detection results:  0
 AWL -0.046 Adjusted score from AWL reputation of From: address
 KAM_DMARC_STATUS 0.01 Test Rule for DKIM or SPF Failure with Strict Alignment
 NICE_REPLY_A           -0.001 Looks like a legit reply (A)
 RCVD_IN_DNSWL_MED        -2.3 Sender listed at https://www.dnswl.org/,
 medium trust
 SPF_HELO_NONE           0.001 SPF: HELO does not publish an SPF Record
 SPF_PASS               -0.001 SPF: sender matches SPF record
 URIBL_BLOCKED 0.001 ADMINISTRATOR NOTICE: The query to URIBL was blocked. See
 http://wiki.apache.org/spamassassin/DnsBlocklists#dnsbl-block for more
 information. [restore.rs]
Subject: Re: [pve-devel] [PATCH v2 proxmox-backup-qemu 05/11] access: use
 bigger cache and LRU chunk reader
X-BeenThere: pve-devel@lists.proxmox.com
X-Mailman-Version: 2.1.29
Precedence: list
List-Id: Proxmox VE development discussion <pve-devel.lists.proxmox.com>
List-Unsubscribe: <https://lists.proxmox.com/cgi-bin/mailman/options/pve-devel>, 
 <mailto:pve-devel-request@lists.proxmox.com?subject=unsubscribe>
List-Archive: <http://lists.proxmox.com/pipermail/pve-devel/>
List-Post: <mailto:pve-devel@lists.proxmox.com>
List-Help: <mailto:pve-devel-request@lists.proxmox.com?subject=help>
List-Subscribe: <https://lists.proxmox.com/cgi-bin/mailman/listinfo/pve-devel>, 
 <mailto:pve-devel-request@lists.proxmox.com?subject=subscribe>
X-List-Received-Date: Wed, 17 Mar 2021 13:59:08 -0000

On 17.03.21 14:37, Stefan Reiter wrote:
> On 16/03/2021 21:17, Thomas Lamprecht wrote:
>> On 03.03.21 10:56, Stefan Reiter wrote:
>>> Values chosen by fair dice roll, seems to be a good sweet spot on my
>>> machine where any less causes performance degradation but any more
>>> doesn't really make it go any faster.
>>>
>>> Keep in mind that those values are per drive in an actual restore.
>>>
>>> Signed-off-by: Stefan Reiter <s.reiter@proxmox.com>
>>> ---
>>>
>>> Depends on new proxmox-backup.
>>>
>>> v2:
>>> * unchanged
>>>
>>> =C2=A0 src/restore.rs | 5 +++--
>>> =C2=A0 1 file changed, 3 insertions(+), 2 deletions(-)
>>>
>>> diff --git a/src/restore.rs b/src/restore.rs
>>> index 0790d7f..a1acce4 100644
>>> --- a/src/restore.rs
>>> +++ b/src/restore.rs
>>> @@ -218,15 +218,16 @@ impl RestoreTask {
>>> =C2=A0 =C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0 let ind=
ex =3D client.download_fixed_index(&manifest, &archive_name).await?;
>>> =C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0 let archive_si=
ze =3D index.index_bytes();
>>> -=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0 let most_used =3D index.f=
ind_most_used_chunks(8);
>>> +=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0 let most_used =3D index.f=
ind_most_used_chunks(16); // 64 MB most used cache
>>
>>
>>
>>> =C2=A0 =C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0 let fil=
e_info =3D manifest.lookup_file_info(&archive_name)?;
>>> =C2=A0 -=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0 let chunk_reader =3D=20
RemoteChunkReader::new(
>>> +=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0 let chunk_reader =3D Remo=
teChunkReader::new_lru_cached(
>>> =C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=
=A0=C2=A0 Arc::clone(&client),
>>> =C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=
=A0=C2=A0 self.crypt_config.clone(),
>>> =C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=
=A0=C2=A0 file_info.chunk_crypt_mode(),
>>> =C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=
=A0=C2=A0 most_used,
>>> +=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0 6=
4, // 256 MB LRU cache
>>
>> how does this work with low(er) memory situations? Lots of people do n=
ot over
>> dimension their memory that much, and especially the need for mass-rec=
overy could
>> seem to correlate with reduced resource availability (a node failed, n=
ow I need
>> to restore X backups on my <test/old/other-already-in-use> node, so mu=
ltiple
>> restore jobs may run in parallel, and they all may have even multiple =
disks,
>> so tens of GiB of memory just for the cache are not that unlikely.
>=20
> This is a seperate function from the regular restore, so it currently o=
nly affects live-restore. This is not an operation you would usually do u=
nder memory constraints anyway, and regular restore is unaffected if you =
just want the data.

And how exactly do you figure/argue that users won't use it if easily ava=
ilable?
Users *will* do use this in a memory constrained environment as it gets t=
heir guest
faster up again, cue mass restore on node with not much resources left.
=20
> Upcoming single-file restore too though, I suppose, where it might make=20
more sense...
>=20
>>
>> How is the behavior, hard failure if memory is not available? Also, so=
me archives
>> may be smaller than 256 MiB (EFI disk??) so there it'd be weird to hav=
e 256 cache
>> and get 64 of most used chunks if that's all/more than it would actual=
ly need to
>> be..
>=20
> Yes, if memory is unavailable it is a hard error. Memory should not be =
pre-allocated however, so restoring this way will only ever use as much m=
emory as the disk size (not accounting for overhead).

So basically RSS is increased by chunk-sized blocks. But a alloc error is=20
not a hard
error here for the total operation, couldn't we catch that and continue w=
ith the LRU
size we actually have allocated?

>=20
>>
>> There may be the reversed situation too, beefy fast node with lots of =
memory
>> and restore is used as recovery or migration but network bw/latency to=20
PBS is not
>> that good - so bigger cache could be wanted.
>=20
> The reason I chose the numbers I did was that I couldn't see any real p=
erformance benefits by going higher, though I didn't specifically test wi=
th slow networking.
>=20
> I don't believe more cache would improve the situation there though, th=
is is mostly to avoid random access from the guest and the linear access =
from the block-stream operation to interfere with each other, and allow m=
ultiple smaller guest reads within the same chunk to be served quickly.

What are the workloads you tested to be so sure about this?

=46rom above statement I'd think that for any workload with a working set=20
bigger than
256 MiB it would help? So basically any production DB load (albeit that s=
hould be
handled by the DBs memory caching, so maybe not the best example).

I'm just thinking that exposing this as a knob could help, must not be=20
placed, but would be nice if there.

>=20
>>
>> Maybe we could get the available memory and use that as hint, I mean a=
s memory
>> usage can be highly dynamic it will never be perfect, but better than =
just ignoring
>> it..
>=20
> If anything, I'd make it user-configurable - I don't think a heuristic =
would be a good choice here.

Yeah, heuristic is not an good option as we cannot know how the system me=
mory
situation will be in the future.

>=20
> This way we could also set it smaller for single-file restore for examp=
le - on the other hand, that adds another parameter to the already somewh=
at cluttered QEMU<->Rust interface.

cue versioned structs incoming ;)

>=20
>>
>>> =C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0 );
>>> =C2=A0 =C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0 let rea=
der =3D AsyncIndexReader::new(index, chunk_reader);
>>>
>>