public inbox for pve-devel@lists.proxmox.com
 help / color / mirror / Atom feed
From: Fiona Ebner <f.ebner@proxmox.com>
To: Thomas Lamprecht <t.lamprecht@proxmox.com>,
	Proxmox VE development discussion <pve-devel@lists.proxmox.com>
Subject: Re: [pve-devel] [PATCH ha-manager 09/11] manager: use static resource scheduler when configured
Date: Wed, 16 Nov 2022 10:37:18 +0100	[thread overview]
Message-ID: <82c69808-1032-ff32-1d23-ceacdc0a11eb@proxmox.com> (raw)
In-Reply-To: <ff380260-8de5-57d8-321e-7a1e0b8893cf@proxmox.com>

Am 16.11.22 um 08:14 schrieb Thomas Lamprecht:
> Am 11/11/2022 um 10:28 schrieb Fiona Ebner:
>> Am 10.11.22 um 15:37 schrieb Fiona Ebner:
>>> @@ -206,11 +207,30 @@ my $valid_service_states = {
>>>  sub recompute_online_node_usage {
>> So I was a bit worried that recompute_online_node_usage() would become
>> too inefficient with the new add_service_usage_to_node() overhead from
>> needing to read the guest configs. I now tested it with ~300 HA services
>> (minimal containers) running on my virtual test cluster.
>>
>> Timings with 'basic' mode were between 0.0004 - 0.001 seconds
>> Timings with 'static' mode were between 0.007 - 0.012 seconds
>>
>> While about a 10-fold increase, it's not too dramatic at least. I guess
>> that's what the caching of cfs files is for :)
>>
>> Still, the function is currently not only called in the main loop in
>> manage(), but also in next_state_recovery() and change_service_state().
>>
>> With, say, 400 HA services each on 5 nodes, if a node fails there's
>> 400 calls from changing to freeze
> 
> huh, freeze should only happen on graceful shutdown of a node, not
> if it fails?

Sorry, I meant fence not freeze.

> 
>> 400 calls from changing to recovery
>> 400 calls in next_state_recovery
>> 400 calls from changing to started
>> If we take a generous estimate that each call takes 0.1 seconds (there's
>> 2000 services in total), that's 40+80+40 seconds in 3 bursts during the
>> fencing and recovery period.
> 
> doesn't that lead to overly long run windows between watchdog updates?
> 
>>
>> Is that acceptable? Should I try to optimize how often the function is
>> called?
>>
> 
> hmm, a quick look wouldn't hurt, but not required for now IMO - if it can
> interfere with watchdog updates I'd sneak in updating it once in between
> though.
> 

Yes, from a quick look that might become a problem, exactly because the
delays happen in bursts (all services change state in a single manage()
run).

Not sure how you would trigger the update, because that would need to
happen in the CRM AFAIU?

There is a fixme comment in CRM.pm's work() to set an alert timer and
enforce working for at most $max_time seconds. That would of course help
here.

Getting rid of superfluous recompute_online_node_usage() calls should
also not be impossible. We'd need to ensure that we add service usage
(that already is done in recovery and next_state_started) and remove
service usage (removing is not implemented right now) when changing
nodes or states. Then it'd be enough to call
recompute_online_node_usage() once per cycle and it'd be a huge
improvement compared to now. Additionally, we could call it whenever we
iterated a certain number of services, just to be sure.

> 
> ps. maybe you can have some of that info/stats here in the commit message
> of this patch.

Sure.




  reply	other threads:[~2022-11-16  9:37 UTC|newest]

Thread overview: 41+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2022-11-10 14:37 [pve-devel] [PATCH-SERIES proxmox-resource-scheduling/pve-ha-manager/etc] add static usage scheduler for HA manager Fiona Ebner
2022-11-10 14:37 ` [pve-devel] [PATCH proxmox-resource-scheduling 1/3] initial commit Fiona Ebner
2022-11-15 10:15   ` [pve-devel] applied: " Wolfgang Bumiller
2022-11-15 15:39   ` [pve-devel] " DERUMIER, Alexandre
2022-11-16  9:09     ` Fiona Ebner
2022-11-10 14:37 ` [pve-devel] [PATCH proxmox-resource-scheduling 2/3] add pve_static module Fiona Ebner
2022-11-16  9:18   ` Thomas Lamprecht
2022-11-10 14:37 ` [pve-devel] [PATCH proxmox-resource-scheduling 3/3] add Debian packaging Fiona Ebner
2022-11-10 14:37 ` [pve-devel] [PATCH proxmox-perl-rs 1/2] pve-rs: add resource scheduling module Fiona Ebner
2022-11-15 10:16   ` [pve-devel] applied-series: " Wolfgang Bumiller
2022-11-10 14:37 ` [pve-devel] [PATCH proxmox-perl-rs 2/2] add basic test for resource scheduling Fiona Ebner
2022-11-10 14:37 ` [pve-devel] [PATCH manager 1/3] pvestatd: broadcast static node information Fiona Ebner
2022-11-10 14:37 ` [pve-devel] [PATCH v3 manager 2/3] cluster resources: add cgroup-mode to node properties Fiona Ebner
2022-11-10 14:37 ` [pve-devel] [PATCH v2 manager 3/3] ui: lxc/qemu: cpu edit: make cpuunits depend on node's cgroup version Fiona Ebner
2022-11-10 14:37 ` [pve-devel] [PATCH cluster 1/1] datacenter config: add cluster resource scheduling (crs) options Fiona Ebner
2022-11-17 11:52   ` [pve-devel] applied: " Thomas Lamprecht
2022-11-10 14:37 ` [pve-devel] [PATCH ha-manager 01/11] env: add get_static_node_stats() method Fiona Ebner
2022-11-10 14:37 ` [pve-devel] [PATCH ha-manager 02/11] resources: add get_static_stats() method Fiona Ebner
2022-11-15 13:28   ` Thomas Lamprecht
2022-11-16  8:46     ` Fiona Ebner
2022-11-16  8:59       ` Thomas Lamprecht
2022-11-16 12:38       ` DERUMIER, Alexandre
2022-11-16 12:52         ` Thomas Lamprecht
2022-11-10 14:37 ` [pve-devel] [PATCH ha-manager 03/11] add Usage base plugin and Usage::Basic plugin Fiona Ebner
2022-11-10 14:37 ` [pve-devel] [PATCH ha-manager 04/11] manager: select service node: add $sid to parameters Fiona Ebner
2022-11-16  7:17   ` Thomas Lamprecht
2022-11-10 14:37 ` [pve-devel] [PATCH ha-manager 05/11] manager: online node usage: switch to Usage::Basic plugin Fiona Ebner
2022-11-10 14:37 ` [pve-devel] [PATCH ha-manager 06/11] usage: add Usage::Static plugin Fiona Ebner
2022-11-15 15:55   ` DERUMIER, Alexandre
2022-11-16  9:10     ` Fiona Ebner
2022-11-10 14:37 ` [pve-devel] [PATCH ha-manager 07/11] env: add get_crs_settings() method Fiona Ebner
2022-11-16  7:05   ` Thomas Lamprecht
2022-11-10 14:37 ` [pve-devel] [PATCH ha-manager 08/11] manager: set resource scheduler mode upon init Fiona Ebner
2022-11-10 14:37 ` [pve-devel] [PATCH ha-manager 09/11] manager: use static resource scheduler when configured Fiona Ebner
2022-11-11  9:28   ` Fiona Ebner
2022-11-16  7:14     ` Thomas Lamprecht
2022-11-16  9:37       ` Fiona Ebner [this message]
2022-11-10 14:37 ` [pve-devel] [PATCH ha-manager 10/11] manager: avoid scoring nodes if maintenance fallback node is valid Fiona Ebner
2022-11-10 14:37 ` [pve-devel] [PATCH ha-manager 11/11] manager: avoid scoring nodes when not trying next and current " Fiona Ebner
2022-11-10 14:38 ` [pve-devel] [PATCH docs 1/1] ha: add section about scheduler modes Fiona Ebner
2022-11-15 13:12 ` [pve-devel] partially-applied: [PATCH-SERIES proxmox-resource-scheduling/pve-ha-manager/etc] add static usage scheduler for HA manager Thomas Lamprecht

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=82c69808-1032-ff32-1d23-ceacdc0a11eb@proxmox.com \
    --to=f.ebner@proxmox.com \
    --cc=pve-devel@lists.proxmox.com \
    --cc=t.lamprecht@proxmox.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox
Service provided by Proxmox Server Solutions GmbH | Privacy | Legal