From mboxrd@z Thu Jan  1 00:00:00 1970
Return-Path: <t.lamprecht@proxmox.com>
Received: from firstgate.proxmox.com (firstgate.proxmox.com [212.224.123.68])
 (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits)
 key-exchange X25519 server-signature RSA-PSS (2048 bits))
 (No client certificate requested)
 by lists.proxmox.com (Postfix) with ESMTPS id 6389071048
 for <pve-devel@lists.proxmox.com>; Thu, 23 Jun 2022 13:28:24 +0200 (CEST)
Received: from firstgate.proxmox.com (localhost [127.0.0.1])
 by firstgate.proxmox.com (Proxmox) with ESMTP id 5209E25692
 for <pve-devel@lists.proxmox.com>; Thu, 23 Jun 2022 13:27:54 +0200 (CEST)
Received: from proxmox-new.maurer-it.com (proxmox-new.maurer-it.com
 [94.136.29.106])
 (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits)
 key-exchange X25519 server-signature RSA-PSS (2048 bits))
 (No client certificate requested)
 by firstgate.proxmox.com (Proxmox) with ESMTPS id 413FA25687
 for <pve-devel@lists.proxmox.com>; Thu, 23 Jun 2022 13:27:53 +0200 (CEST)
Received: from proxmox-new.maurer-it.com (localhost.localdomain [127.0.0.1])
 by proxmox-new.maurer-it.com (Proxmox) with ESMTP id 0E6FE43C74;
 Thu, 23 Jun 2022 13:27:53 +0200 (CEST)
Message-ID: <3a376498-7d82-f26d-93c4-428fc34c930c@proxmox.com>
Date: Thu, 23 Jun 2022 13:27:51 +0200
MIME-Version: 1.0
User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:102.0) Gecko/20100101
 Thunderbird/102.0
Content-Language: en-GB
To: Proxmox VE development discussion <pve-devel@lists.proxmox.com>,
 "DERUMIER, Alexandre" <Alexandre.DERUMIER@groupe-cyllene.com>
References: <dd3a9ffa9a66f7a7d8c5cd383f080eb5d3f7076c.camel@groupe-cyllene.com>
From: Thomas Lamprecht <t.lamprecht@proxmox.com>
In-Reply-To: <dd3a9ffa9a66f7a7d8c5cd383f080eb5d3f7076c.camel@groupe-cyllene.com>
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: 7bit
X-SPAM-LEVEL: Spam detection results:  0
 AWL 0.004 Adjusted score from AWL reputation of From: address
 BAYES_00                 -1.9 Bayes spam probability is 0 to 1%
 KAM_DMARC_STATUS 0.01 Test Rule for DKIM or SPF Failure with Strict Alignment
 NICE_REPLY_A           -0.001 Looks like a legit reply (A)
 SPF_HELO_NONE           0.001 SPF: HELO does not publish an SPF Record
 SPF_PASS               -0.001 SPF: sender matches SPF record
 T_SCC_BODY_TEXT_LINE    -0.01 -
 URIBL_BLOCKED 0.001 ADMINISTRATOR NOTICE: The query to URIBL was blocked. See
 http://wiki.apache.org/spamassassin/DnsBlocklists#dnsbl-block for more
 information. [proxmox.com]
Subject: Re: [pve-devel] last training week student feedback/request
X-BeenThere: pve-devel@lists.proxmox.com
X-Mailman-Version: 2.1.29
Precedence: list
List-Id: Proxmox VE development discussion <pve-devel.lists.proxmox.com>
List-Unsubscribe: <https://lists.proxmox.com/cgi-bin/mailman/options/pve-devel>, 
 <mailto:pve-devel-request@lists.proxmox.com?subject=unsubscribe>
List-Archive: <http://lists.proxmox.com/pipermail/pve-devel/>
List-Post: <mailto:pve-devel@lists.proxmox.com>
List-Help: <mailto:pve-devel-request@lists.proxmox.com?subject=help>
List-Subscribe: <https://lists.proxmox.com/cgi-bin/mailman/listinfo/pve-devel>, 
 <mailto:pve-devel-request@lists.proxmox.com?subject=subscribe>
X-List-Received-Date: Thu, 23 Jun 2022 11:28:24 -0000

Hi,

Am 23/06/2022 um 10:25 schrieb DERUMIER, Alexandre:
> 1)
> 
> We have a usecase, with an HA enabled cluster, where student need to
> shutdown the cluster cleaning through api or script. (electrical
> unplanned shutdown, through UPS with nut).
> 
> He want to cleanly stop all the vm, then all the nodes.
> 
> Simply shutdown nodes one by one don't work, because some nodes can
> loose quorum when half of the cluster is already shutdown, so ha is
> stuck and nodes can be fenced by watchdog.
> 
> We have looked to cleany stop all the vms first.
> pve-guest service can't be used for HA.
> So we have done a script with loop to "qm stop" all the vms.
> The problem is that, the HA state of the vms is going to stopped,
> so when servers are restarting after the maintenance, we need to script
> again a qm start of the vms.
> 
> 
> Student asked if it could be possible to add some kind of "cluster
> maintenance" option, to disable HA on the full cluster (pause/stop all
> pve-ha-crm/lrm + disabling watchdog), and temporary remove all vms
> services from ha.
> 
> 
> I think it could be usefull too when adding new nodes to the cluster,
> when a bad corosync new node could impact the whole cluster.

We talked about something like that in our internal chat a bit ago:

> 
> For the HA it would be basically a maintenance mode the master node
> propagates without any service daemon stop/starts or the like (just as
> dangerous too) that then can be handled live, and the status can display the
> "currently entering" vs. "maintenance active" (once all LRMs switched their
> state correctly) differences, additionally one could imagine having two
> different modi "ignore all  a "ignore every service command" and a "unsafe
> redirect service commands as there isn't any HA active"

If this then is done automatically on cluster node join is a bit of another
question, but should be relatively easy to add.

> 
> 
> Also, related to this, maybe a "node maintenance option" could be great
> too, like of vmware. (auto vms eviction with live migration).
> when user need to change network config for example, withtout shutdown
> the node.
> 
> 
> 2) 
> Another student have a need with pci passthrough, cluster with
> multiples nodes with multiple pci cards. 
> He's using HA and have 1 or 2 backups nodes with a lot of cards,
> to be able to failover 10 others servers.

See Dominik's RFC:
https://lists.proxmox.com/pipermail/pve-devel/2021-June/048862.html

Should be possible to get that in for 7.3:

> 
> 5)
> All my students have windows reboot stuck problem since migration to
> proxmox 7. (I have the problem too, randomly,I'm currently trying to
> debug this).

Yeah, reproducing this is really hard and the main issue on holding up a fix,
added to that there seems to be more than one problem, (stuck vs. crash) that
we try to investigate in parallel.

Our slightly questionable reproducer for the crash one showed that issues
started with 5.15 (5.14 and their stable releases seems to be fine, albeit
it's hard to tell for sure as there are) and we can only trigger it on a
machine with an outdated bios (carbon copy of that host with a newer bios
won't trigger it).

> 
> 6) 
> 
> PBS: all students are using pbs, and it's working very fine.
> 
> Some users have fast nvme in production, and slower hdd for pbs on a
> remote site.
> 
> Student asked if it could be possible to add some kind of write cache
> on a local pbs with fast nvme, forwarding to the remote slower pbs.
> (Without the need to have a full pbs datastore on local site with nvme)

Hmm, to understand correctly, basically:
A daemon that runs locally and is in-between the remote PBS. It allows to
write the new chunks locally, returning relatively quickly to the qemu/client
and sends the chunks to the actual remote backing store in the background.
If it's full it'd stall until a few chunks where sent out and can be removed
and also stall the worker tasks until it flushed all chunks at the end of the
backup.

Seems like it would add quite a bit of complexity though and would mostly be
helpful when the PBS is really remote with low link speed and higher latency,
not just LAN. IMO it's better to use a full blown PBS with a low keep-x
retention setting and sync periodically to an archive PBS with higher
retention. Needs a bit more storage in the LAN one, but is conceptually much
simpler.