public inbox for pve-devel@lists.proxmox.com
 help / color / mirror / Atom feed
* [pve-devel] proxmox French days conference feedback
@ 2022-06-23  8:43 DERUMIER, Alexandre
  2022-06-23 10:59 ` Thomas Lamprecht
  0 siblings, 1 reply; 3+ messages in thread
From: DERUMIER, Alexandre @ 2022-06-23  8:43 UTC (permalink / raw)
  To: pve-devel

Hi,

2 weeks ago we organised 2 days of proxmox ve/ceph conferences
at Clermont Ferrand university in France.

This was organized by university and CNRS (national research scientific
center).

Proxmox is a lot used in this scientific departements in France,
and we had people coming from everywhere in France.

70 people on site, 300 peoples on streaming.

The purpose of the conference was to exchange about Proxmox experience,
and try to show to French Government and other public departements,
that proxmox VE was viable and working solution for virtualisation.

(As they are still a lot of vmware lobbying, but with broadcom coming
acquisation and lower budgets, they are a lot of planned migration).

Video Replay are available here (in French only , sorry )

https://indico.mathrice.fr/event/327/


So the conference was a success.
Overall experience with proxmox VE/PBS is really good.
Nobody had serious problem with PVE/PBS.

Some are coming from openstack (too big, too complex to manage)
Some others are coming from vmware.
Some others are using proxmox since 0.9 ;)
Some return of experience with ceph too. (with some problems).


Maybe for the most requested missing features:

- cross cluster management + replication/disaster recovery.  (They have
a lot of dual room / dual site.  But 3 sites to keep quorum is not
always possible)

- a drs feature like vmware for vm balancing 
(I'm still working on it, I'll try to have a working after this summer.
 I still need vm pressure stats pending patches apply first ;)

- Pool quota (restrict the total mem/cpu/disk allocated to all vms in a
pool). as they have some clusters shared between differents
departements / students/ ...


A big thanks to Daniela for the T-shirts and other goodies !

Also, another recurrent question:

"Why proxmox team is not more present on differents events/conference 
like fosdem, ... ?"

Yes, we would like to drink some beers with you guys ;)

Regards,

Alexandre





^ permalink raw reply	[flat|nested] 3+ messages in thread

* Re: [pve-devel] proxmox French days conference feedback
  2022-06-23  8:43 [pve-devel] proxmox French days conference feedback DERUMIER, Alexandre
@ 2022-06-23 10:59 ` Thomas Lamprecht
  2022-08-24 12:59   ` DERUMIER, Alexandre
  0 siblings, 1 reply; 3+ messages in thread
From: Thomas Lamprecht @ 2022-06-23 10:59 UTC (permalink / raw)
  To: Proxmox VE development discussion, DERUMIER, Alexandre

Hi,

Am 23/06/2022 um 10:43 schrieb DERUMIER, Alexandre:
> So the conference was a success.
> Overall experience with proxmox VE/PBS is really good.
> Nobody had serious problem with PVE/PBS.
> 
> Some are coming from openstack (too big, too complex to manage)
> Some others are coming from vmware.
> Some others are using proxmox since 0.9 😉
> Some return of experience with ceph too. (with some problems).
> 
> 

Thanks for the feedback and talking about Proxmox projects!

> Maybe for the most requested missing features:
> 
> - cross cluster management + replication/disaster recovery.  (They have
> a lot of dual room / dual site.  But 3 sites to keep quorum is not
> always possible)

That's in the pipeline, cross cluster migration is not that far off and Fabian
should be soon able to pick that up (he works on some infrastructure projects
currently to make air-gapped offline updates possible in a relatively easy and
integrated way), that's one of the biggest pre-requisites left.

> 
> - a drs feature like vmware for vm balancing 
> (I'm still working on it, I'll try to have a working after this summer.
>  I still need vm pressure stats pending patches apply first 😉

I'd really like to more actively work on this from our part too, I thought about
7.3 feature planning a  bit yesterday and wrote a few (rough) edge points for
tackling this, using the alogrithm and rough direction you already worked on in
your proof of concepts (thx!):

- [ ] Static (and later Dynamic) Resource Scheduling (S/DRS)
    - [ ] Coordinate with Alexandre as he's working partly on that too, but we
          may want to use a bit of a different design and/or feature set (at
          least initially) and integration timeline
    - [ ] checkout TOPSIS more closely and implement relevant parts in rust to
          expose via perlmod, that's then fast and much easier to reason
          correctness in static, and safety focused language like rust.
    - [ ] Make basic static resource capacity like CPU (# of socket, core and
          hyper threads) and memory available for other cluster nodes (for
          example, via kv_broadcast after (re)start of pve-cluster)
    - [ ] Add infrastructure to use that static (!) information for balancing
          out the cluster.
        - [ ] spit out a list of actions that would result in a balanced
              cluster: migrate guest A to node X, migrate guest B to node Y
        - [ ] use that for creating a simulation and regression testing system
              in the spirit of the ha-managers simulation and regression test
              system, but as independent test & executable
    - [ ] integrate in HA, due to static-ness and safe-and-slow integration it
          should be first only be done on recovery, for better balancing out.
    - [ ] add API create support for creating a CT/VM to the best fitting node,
          i.e., the lowest used one
    - [ ] make balancing algo available for non-ha too, allow a cluster wide
          manual re-balance (e.g., with action-proposal shown to user for
          confirmation)
    - [ ] Extend with dynamic information like IO/memory/CPU pressure
    - [ ] Finally: Allow to opt-in in periodic auto-balancing for HA managed

IMO the semi-static resource availability and usage would be nice in general as
first step, that could then also allow one to relatively easily pre-error/warn out
on VM start if there won't be enough memory available (maybe overridable for odd
zram/KSM cases, or when one just doesn't want to care about that and likes OOMs ;)

> 
> - Pool quota (restrict the total mem/cpu/disk allocated to all vms in a
> pool). as they have some clusters shared between differents
> departements / students/ ...
> 

It'd need a bit new infrastructure, but it wouldn't be /that/ hard to implement.

> 
> A big thanks to Daniela for the T-shirts and other goodies !
> 
> Also, another recurrent question:
> 
> "Why proxmox team is not more present on differents events/conference 
> like fosdem, ... ?"
> 

Some colleagues frequent fosdem, but in the last two and a half years doing
or attending presence conferences was a bit difficult.

> Yes, we would like to drink some beers with you guys 😉

Would be nice! :-)




^ permalink raw reply	[flat|nested] 3+ messages in thread

* Re: [pve-devel] proxmox French days conference feedback
  2022-06-23 10:59 ` Thomas Lamprecht
@ 2022-08-24 12:59   ` DERUMIER, Alexandre
  0 siblings, 0 replies; 3+ messages in thread
From: DERUMIER, Alexandre @ 2022-08-24 12:59 UTC (permalink / raw)
  To: Thomas Lamprecht, Proxmox VE development discussion

Hi Thomas,

Sorry I totally miss your reponse !

>> - a drs feature like vmware for vm balancing
>> (I'm still working on it, I'll try to have a working after this summer.
>>   I still need vm pressure stats pending patches apply first 😉
> 
> I'd really like to more actively work on this from our part too, I thought about
> 7.3 feature planning a  bit yesterday and wrote a few (rough) edge points for
> tackling this, using the alogrithm and rough direction you already worked on in
> your proof of concepts (thx!):
> 
> - [ ] Static (and later Dynamic) Resource Scheduling (S/DRS)
>      - [ ] Coordinate with Alexandre as he's working partly on that too, but we
>            may want to use a bit of a different design and/or feature set (at
>            least initially) and integration timeline

I have free time to work/help on this in coming months, just tell me if 
we can sync work.

>      - [ ] checkout TOPSIS more closely and implement relevant parts in rust to
>            expose via perlmod, that's then fast and much easier to reason
>            correctness in static, and safety focused language like rust.

I can help if you have question to reimplement topsis in rust (I have 
done it from stratch in perl following the youtube math tutorial, it's 
not too difficult).

>      - [ ] Make basic static resource capacity like CPU (# of socket, core and
>            hyper threads) and memory available for other cluster nodes (for
>            example, via kv_broadcast after (re)start of pve-cluster)
>      - [ ] Add infrastructure to use that static (!) information for balancing
>            out the cluster.
>          - [ ] spit out a list of actions that would result in a balanced
>                cluster: migrate guest A to node X, migrate guest B to node Y
>          - [ ] use that for creating a simulation and regression testing system
>                in the spirit of the ha-managers simulation and regression test
>                system, but as independent test & executable
I think adding to user a manual balancing feature, with static/preview 
list of migration with manual approval could be great too

>      - [ ] integrate in HA, due to static-ness and safe-and-slow integration it
>            should be first only be done on recovery, for better balancing out.
>      - [ ] add API create support for creating a CT/VM to the best fitting node,
>            i.e., the lowest used one
yes, needed. (and also maybe start on the best fitting node)

>      - [ ] make balancing algo available for non-ha too, allow a cluster wide
>            manual re-balance (e.g., with action-proposal shown to user for
>            confirmation)
yes, some users already have asked me about the non-ha vm.

>      - [ ] Extend with dynamic information like IO/memory/CPU pressure

cpu pressure is really the most important here. Because you can't trust 
cpu usage. (I have some servers with 60% cpu usage totally overloaded, 
and other servers with 80% cpu usage not overloaded).

The main problem is that we have average value across all cores, and 
with a lot of cores, some cores can be stuck at 100%, other at 10%, this 
give you a low cpu usage. But if some vms need to use a lot of cores, 
they will be overloaded.

That's why in my code, I'm looking only for cpu pressure on source node,
and on the target node, I'm looking to  low cpu pressure + cpu usage 
under 80%.  (We can only trust cpu usage if cpu pressure is low)


>      - [ ] Finally: Allow to opt-in in periodic auto-balancing for HA managed
> 
> IMO the semi-static resource availability and usage would be nice in general as
> first step, that could then also allow one to relatively easily pre-error/warn out
> on VM start if there won't be enough memory available (maybe overridable for odd
> zram/KSM cases, or when one just doesn't want to care about that and likes OOMs ;)
> 


I have a lot of customers wanting to migrate from vmware with the 
broadcom acquisition, but the missing drs feature is really blocking for 
them.

If needed, we could also help to finance this feature, if you need extra 
developper.
Just ask me !








^ permalink raw reply	[flat|nested] 3+ messages in thread

end of thread, other threads:[~2022-08-24 13:00 UTC | newest]

Thread overview: 3+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2022-06-23  8:43 [pve-devel] proxmox French days conference feedback DERUMIER, Alexandre
2022-06-23 10:59 ` Thomas Lamprecht
2022-08-24 12:59   ` DERUMIER, Alexandre

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox
Service provided by Proxmox Server Solutions GmbH | Privacy | Legal