public inbox for pve-devel@lists.proxmox.com
 help / color / mirror / Atom feed
* [pve-devel] last training week student feedback/request
@ 2022-06-23  8:25 DERUMIER, Alexandre
  2022-06-23  8:37 ` Dominik Csapak
  2022-06-23 11:27 ` Thomas Lamprecht
  0 siblings, 2 replies; 3+ messages in thread
From: DERUMIER, Alexandre @ 2022-06-23  8:25 UTC (permalink / raw)
  To: pve-devel

Hi,

I just finished my proxmox training week, 

here some student requests/feedback:


1)

We have a usecase, with an HA enabled cluster, where student need to
shutdown the cluster cleaning through api or script. (electrical
unplanned shutdown, through UPS with nut).

He want to cleanly stop all the vm, then all the nodes.

Simply shutdown nodes one by one don't work, because some nodes can
loose quorum when half of the cluster is already shutdown, so ha is
stuck and nodes can be fenced by watchdog.

We have looked to cleany stop all the vms first.
pve-guest service can't be used for HA.
So we have done a script with loop to "qm stop" all the vms.
The problem is that, the HA state of the vms is going to stopped,
so when servers are restarting after the maintenance, we need to script
again a qm start of the vms.


Student asked if it could be possible to add some kind of "cluster
maintenance" option, to disable HA on the full cluster (pause/stop all
pve-ha-crm/lrm + disabling watchdog), and temporary remove all vms
services from ha.


I think it could be usefull too when adding new nodes to the cluster,
when a bad corosync new node could impact the whole cluster.


Also, related to this, maybe a "node maintenance option" could be great
too, like of vmware. (auto vms eviction with live migration).
when user need to change network config for example, withtout shutdown
the node.


2) 
Another student have a need with pci passthrough, cluster with
multiples nodes with multiple pci cards. 
He's using HA and have 1 or 2 backups nodes with a lot of cards,
to be able to failover 10 others servers.

The problem is that on the backups nodes, the pci address of the cards
are not always the same than production nodes.
So Ha can't work.

I think it could be great to add some kind of "shared local device
pool" at datacenter level, where we could define

pci:     poolname
         node1:pciaddress
         node2:pciaddress

usb:     poolname
         node1:usbport
         node2:usbport
         

so we could dynamicaly choose the correct pci address when restarting
the vm.

Permissions could be added too, maybe a migratable option when mdev
live migration support will be ready, ...


3)
Related to 2), another student have a need of live migraton with nvidia
card with mdev.
I'm currently trying to test to see if it's possible, as they are some
experimental vfio option to enable it, but it doesn't seem to be ready.



4)
Multi-cluster managements (I have a lot of request about this one at
the proxmox days conference too)

Usecase : Global management,

but maybe more important,for disaster recovery (1 active cluster + 1
passive cluster), be able to use zfs replication  or ceph mirroring
between 2 proxmox cluster + replicate  vm configuration.


5)
All my students have windows reboot stuck problem since migration to
proxmox 7. (I have the problem too, randomly,I'm currently trying to
debug this).

6) 

PBS: all students are using pbs, and it's working very fine.

Some users have fast nvme in production, and slower hdd for pbs on a
remote site.

Student asked if it could be possible to add some kind of write cache
on a local pbs with fast nvme, forwarding to the remote slower pbs.
(Without the need to have a full pbs datastore on local site with nvme)




^ permalink raw reply	[flat|nested] 3+ messages in thread

* Re: [pve-devel] last training week student feedback/request
  2022-06-23  8:25 [pve-devel] last training week student feedback/request DERUMIER, Alexandre
@ 2022-06-23  8:37 ` Dominik Csapak
  2022-06-23 11:27 ` Thomas Lamprecht
  1 sibling, 0 replies; 3+ messages in thread
From: Dominik Csapak @ 2022-06-23  8:37 UTC (permalink / raw)
  To: Proxmox VE development discussion, DERUMIER, Alexandre

On 6/23/22 10:25, DERUMIER, Alexandre wrote:
> Hi,
> 
> I just finished my proxmox training week,
> 
> here some student requests/feedback:

Hi,

i just answer the points where i'm currently involved, so someone else
might answer to the other ones ;)

[snip]
> 2)
> Another student have a need with pci passthrough, cluster with
> multiples nodes with multiple pci cards.
> He's using HA and have 1 or 2 backups nodes with a lot of cards,
> to be able to failover 10 others servers.
> 
> The problem is that on the backups nodes, the pci address of the cards
> are not always the same than production nodes.
> So Ha can't work.
> 
> I think it could be great to add some kind of "shared local device
> pool" at datacenter level, where we could define
> 
> pci:     poolname
>           node1:pciaddress
>           node2:pciaddress
> 
> usb:     poolname
>           node1:usbport
>           node2:usbport
>           
> 
> so we could dynamicaly choose the correct pci address when restarting
> the vm.
> 
> Permissions could be added too, maybe a migratable option when mdev
> live migration support will be ready, ...

i was working on that last year, but got hold up with other stuff,
but i'm planning to picking this up again this/next week

my solution looked very similar to yours, with additional fields
to uniquely identify the card (to prevent accidental pass-through
when the address changes fore example)

permissions are also planned there...

> 
> 
> 3)
> Related to 2), another student have a need of live migraton with nvidia
> card with mdev.
> I'm currently trying to test to see if it's possible, as they are some
> experimental vfio option to enable it, but it doesn't seem to be ready.
> 

would be cool, i'd like to have some vgpu capable cards to test here,
but so far no luck (also access/support to/of the vgpu driver
from nvidia is probably the bigger problem AFAICS)

kind regards
Dominik




^ permalink raw reply	[flat|nested] 3+ messages in thread

* Re: [pve-devel] last training week student feedback/request
  2022-06-23  8:25 [pve-devel] last training week student feedback/request DERUMIER, Alexandre
  2022-06-23  8:37 ` Dominik Csapak
@ 2022-06-23 11:27 ` Thomas Lamprecht
  1 sibling, 0 replies; 3+ messages in thread
From: Thomas Lamprecht @ 2022-06-23 11:27 UTC (permalink / raw)
  To: Proxmox VE development discussion, DERUMIER, Alexandre

Hi,

Am 23/06/2022 um 10:25 schrieb DERUMIER, Alexandre:
> 1)
> 
> We have a usecase, with an HA enabled cluster, where student need to
> shutdown the cluster cleaning through api or script. (electrical
> unplanned shutdown, through UPS with nut).
> 
> He want to cleanly stop all the vm, then all the nodes.
> 
> Simply shutdown nodes one by one don't work, because some nodes can
> loose quorum when half of the cluster is already shutdown, so ha is
> stuck and nodes can be fenced by watchdog.
> 
> We have looked to cleany stop all the vms first.
> pve-guest service can't be used for HA.
> So we have done a script with loop to "qm stop" all the vms.
> The problem is that, the HA state of the vms is going to stopped,
> so when servers are restarting after the maintenance, we need to script
> again a qm start of the vms.
> 
> 
> Student asked if it could be possible to add some kind of "cluster
> maintenance" option, to disable HA on the full cluster (pause/stop all
> pve-ha-crm/lrm + disabling watchdog), and temporary remove all vms
> services from ha.
> 
> 
> I think it could be usefull too when adding new nodes to the cluster,
> when a bad corosync new node could impact the whole cluster.

We talked about something like that in our internal chat a bit ago:

> 
> For the HA it would be basically a maintenance mode the master node
> propagates without any service daemon stop/starts or the like (just as
> dangerous too) that then can be handled live, and the status can display the
> "currently entering" vs. "maintenance active" (once all LRMs switched their
> state correctly) differences, additionally one could imagine having two
> different modi "ignore all  a "ignore every service command" and a "unsafe
> redirect service commands as there isn't any HA active"

If this then is done automatically on cluster node join is a bit of another
question, but should be relatively easy to add.

> 
> 
> Also, related to this, maybe a "node maintenance option" could be great
> too, like of vmware. (auto vms eviction with live migration).
> when user need to change network config for example, withtout shutdown
> the node.
> 
> 
> 2) 
> Another student have a need with pci passthrough, cluster with
> multiples nodes with multiple pci cards. 
> He's using HA and have 1 or 2 backups nodes with a lot of cards,
> to be able to failover 10 others servers.

See Dominik's RFC:
https://lists.proxmox.com/pipermail/pve-devel/2021-June/048862.html

Should be possible to get that in for 7.3:

> 
> 5)
> All my students have windows reboot stuck problem since migration to
> proxmox 7. (I have the problem too, randomly,I'm currently trying to
> debug this).

Yeah, reproducing this is really hard and the main issue on holding up a fix,
added to that there seems to be more than one problem, (stuck vs. crash) that
we try to investigate in parallel.

Our slightly questionable reproducer for the crash one showed that issues
started with 5.15 (5.14 and their stable releases seems to be fine, albeit
it's hard to tell for sure as there are) and we can only trigger it on a
machine with an outdated bios (carbon copy of that host with a newer bios
won't trigger it).

> 
> 6) 
> 
> PBS: all students are using pbs, and it's working very fine.
> 
> Some users have fast nvme in production, and slower hdd for pbs on a
> remote site.
> 
> Student asked if it could be possible to add some kind of write cache
> on a local pbs with fast nvme, forwarding to the remote slower pbs.
> (Without the need to have a full pbs datastore on local site with nvme)

Hmm, to understand correctly, basically:
A daemon that runs locally and is in-between the remote PBS. It allows to
write the new chunks locally, returning relatively quickly to the qemu/client
and sends the chunks to the actual remote backing store in the background.
If it's full it'd stall until a few chunks where sent out and can be removed
and also stall the worker tasks until it flushed all chunks at the end of the
backup.

Seems like it would add quite a bit of complexity though and would mostly be
helpful when the PBS is really remote with low link speed and higher latency,
not just LAN. IMO it's better to use a full blown PBS with a low keep-x
retention setting and sync periodically to an archive PBS with higher
retention. Needs a bit more storage in the LAN one, but is conceptually much
simpler.




^ permalink raw reply	[flat|nested] 3+ messages in thread

end of thread, other threads:[~2022-06-23 11:28 UTC | newest]

Thread overview: 3+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2022-06-23  8:25 [pve-devel] last training week student feedback/request DERUMIER, Alexandre
2022-06-23  8:37 ` Dominik Csapak
2022-06-23 11:27 ` Thomas Lamprecht

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox
Service provided by Proxmox Server Solutions GmbH | Privacy | Legal