public inbox for pve-devel@lists.proxmox.com
 help / color / mirror / Atom feed
From: Fiona Ebner <f.ebner@proxmox.com>
To: "DERUMIER, Alexandre" <Alexandre.DERUMIER@groupe-cyllene.com>,
	"pve-devel@lists.proxmox.com" <pve-devel@lists.proxmox.com>,
	"aderumier@odiso.com" <aderumier@odiso.com>
Subject: Re: [pve-devel] [PATCH v2 qemu-server 8/9] memory: add virtio-mem support
Date: Wed, 25 Jan 2023 10:54:10 +0100	[thread overview]
Message-ID: <5c097397-3777-66a9-579c-b7d99941c399@proxmox.com> (raw)
In-Reply-To: <b68f0f56607950351ee5f291b24fc6ad84ee5bf7.camel@groupe-cyllene.com>

Am 25.01.23 um 10:00 schrieb DERUMIER, Alexandre:
> Le mardi 24 janvier 2023 à 14:06 +0100, Fiona Ebner a écrit :
>> Am 04.01.23 um 07:43 schrieb Alexandre Derumier:
>>> +sub get_virtiomem_block_size {
>>> +    my ($conf) = @_;
>>> +
>>> +    my $MAX_MEM = get_max_mem($conf);
>>> +    my $static_memory = get_static_mem($conf);
>>> +    my $memory = get_current_memory($conf->{memory});
>>> +
>>> +    #virtiomem can map 32000 block size.
>>> +    #try to use lowest blocksize, lower = more chance to unplug
>>> memory.
>>> +    my $blocksize = ($MAX_MEM - $static_memory) / 32000;
>>> +    #2MB is the minimum to be aligned with THP
>>> +    $blocksize = 2**(ceil(log($blocksize)/log(2)));
>>> +    $blocksize = 4 if $blocksize < 4;
>>
>> Why suddenly 4?
> 
> I have added a note in the commit :
> 
>> Note: Currently, linux only support 4MB virtio blocksize, 2MB support
>> is currently is progress.
>>
> 
> So 2MB is valid from qemu side, but linux guest kernel don't support it
> actually. At least , you need to use multiple of 4MB. you can
> remove/add 2 blocks of 2MB at the same time, but it don't seem to be
> atomic, so I think it's better to use the minimum currently supported
> bloc.

Sorry, I missed that.

> Maybe later, we could extend the virtio=X option, to tell the virtio
> supported version.  (virtio=1.1  , virtio=1.2), and enable supported
> features ?

If we really need to and can't detect it otherwise, sure.

>>> +my sub balance_virtiomem {
>>
>> This function is rather difficult to read. The "record errors and
>> filter" logic really should be its own patch after the initial
>> support.
>> FWIW, I tried my best and it does seems fine :)
>>
>> But it's not clear to me that we even want that logic? Is it really
>> that
>> common for qom-set to take so long to be worth introducing all this
>> additional handling/complexity? Or should it just be a hard error if
>> qom-set still didn't have an effect on a device after 5 seconds.
>>
> from my test,It can take 2-3second on unplug on bigger setup. I'm doing
> it in // to be faster, to avoid to wait nbdimm * 2-3seconds.

Sure, doing it in parallel is perfectly fine. I'm just thinking that
switching gears (too early) and redirecting the request might not be
ideal. You also issue an additional qom-set to go back to
$virtiomem->{current} * 1024 *1024 if a request didn't make progress in
time. But to be sure that request worked, we'd also need to monitor it
;) I think issuing that request is fine as-is, but if there is a
"hanging" device, we really can't do much. And I do think the user
should be informed via an appropriate error if there is a problematic
device.

Maybe we can use 10 seconds instead of 5 (2-3 seconds already sounds too
close to 5 IMHO), so that we have a good margin, and just die instead of
trying to redirect the request to another device. After issuing the
reset request and writing our best guess of what the memory is to the
config of course.

If it really is an issue in practice that certain devices often take too
long for whatever reason, we can still add the redirect logic. But
starting out, I feel like it's not worth the additional complexity.

>> Would it actually be better to just fill up the first, then the
>> second
>> etc. as needed, rather than balancing? My gut feeling is that having
>> fewer "active" devices is better. But this would have to be tested
>> with
>> some benchmarks of course.
> 
> Well, from numa perspective, you really want to balance as much as
> possible. (That's why, with classic hotplug, we add/remove dimm on each
> socket alternatively).
> 
> That's the whole point of numa, read the nearest memory attached to the
> processor where the process are running.
> 
> That's a main advantage of virtio-mem  vs balloning (which doesn't
> handle numa, and remove pages randomly on any socket)

Makes sense. Thanks for the explanation!




  reply	other threads:[~2023-01-25  9:54 UTC|newest]

Thread overview: 30+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2023-01-04  6:42 [pve-devel] [PATCH v2 qemu-server 0/9] rework memory hotplug + virtiomem Alexandre Derumier
2023-01-04  6:42 ` [pve-devel] [PATCH v2 qemu-server 1/9] test: add memory tests Alexandre Derumier
2023-01-24 13:04   ` Fiona Ebner
2023-01-04  6:42 ` [pve-devel] [PATCH v2 qemu-server 2/9] add memory parser Alexandre Derumier
2023-01-24 13:04   ` Fiona Ebner
2023-01-04  6:42 ` [pve-devel] [PATCH v2 qemu-server 3/9] memory: add get_static_mem Alexandre Derumier
2023-01-24 13:04   ` Fiona Ebner
2023-01-04  6:42 ` [pve-devel] [PATCH v2 qemu-server 4/9] config: memory: add 'max' option Alexandre Derumier
2023-01-24 13:05   ` Fiona Ebner
2023-01-27 15:03     ` DERUMIER, Alexandre
2023-01-30  8:03       ` Fiona Ebner
2023-01-30  8:45         ` DERUMIER, Alexandre
2023-01-04  6:42 ` [pve-devel] [PATCH v2 qemu-server 5/9] memory: get_max_mem: use config memory max Alexandre Derumier
2023-01-24 13:05   ` Fiona Ebner
2023-01-27 15:15     ` DERUMIER, Alexandre
2023-01-30  8:04       ` Fiona Ebner
2023-01-04  6:43 ` [pve-devel] [PATCH v2 qemu-server 6/9] memory: use 64 slots && static dimm size when max is defined Alexandre Derumier
2023-01-24 13:06   ` Fiona Ebner
2023-01-27 15:52     ` DERUMIER, Alexandre
2023-01-04  6:43 ` [pve-devel] [PATCH v2 qemu-server 7/9] test: add memory-max tests Alexandre Derumier
2023-01-24 13:06   ` Fiona Ebner
2023-01-04  6:43 ` [pve-devel] [PATCH v2 qemu-server 8/9] memory: add virtio-mem support Alexandre Derumier
2023-01-24 13:06   ` Fiona Ebner
2023-01-25  9:00     ` DERUMIER, Alexandre
2023-01-25  9:54       ` Fiona Ebner [this message]
2023-01-25 10:28         ` DERUMIER, Alexandre
2023-01-25 10:52           ` Fiona Ebner
2023-01-04  6:43 ` [pve-devel] [PATCH v2 qemu-server 9/9] tests: add virtio-mem tests Alexandre Derumier
2023-01-24 13:08   ` Fiona Ebner
2023-01-24 13:08 ` [pve-devel] [PATCH v2 qemu-server 0/9] rework memory hotplug + virtiomem Fiona Ebner

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=5c097397-3777-66a9-579c-b7d99941c399@proxmox.com \
    --to=f.ebner@proxmox.com \
    --cc=Alexandre.DERUMIER@groupe-cyllene.com \
    --cc=aderumier@odiso.com \
    --cc=pve-devel@lists.proxmox.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox
Service provided by Proxmox Server Solutions GmbH | Privacy | Legal