all lists on lists.proxmox.com
 help / color / mirror / Atom feed
From: Thomas Lamprecht <t.lamprecht@proxmox.com>
To: Proxmox VE development discussion <pve-devel@lists.proxmox.com>,
	Stefan Lendl <s.lendl@proxmox.com>
Subject: Re: [pve-devel] [WIP v2 cluster/network/manager/qemu-server/container 00/10] Add support for DHCP servers to SDN
Date: Fri, 27 Oct 2023 09:39:06 +0200	[thread overview]
Message-ID: <330b6d23-6a0f-4041-9892-26944fb7e30d@proxmox.com> (raw)
In-Reply-To: <87v8axbjh1.fsf@gmail.com>

Am 23/10/2023 um 14:40 schrieb Stefan Lendl:
> I am currently working on the SDN feature.  This is an initial review of
> the patch series and I am trying to make a strong case against ephemeral
> DHCP IP reservation.

Stefan Hanreich's reply to the cover letter already mentions upserts, those
will avoid basically all problems while allowing for some dynamic changes.

> The current state of the patch series invokes the IPAM on every VM/CT
> start/stop to add or remove the IP from the IPAM.
> This triggers the dnsmasq config generation on the specific host with
> only the MAC/IP mapping of that particular host.
> 
> From reading the discussion of the v1 patch series I understand this
> approach tries to implement the ephemeral IP reservation strategy. From
> off-list conversations with Stefan Hanreich, I agree that having
> ephemeral IP reservation coordinated by the IPAM requires us to
> re-implement DHCP functionality in the IPAM and heavily rely on syncing
> between the different services.
> 
> To maintain reliable sync we need to hook into many different places
> where the IPAM need to be queried.  Any issues with the implementation
> may lead to IPAM and DHCP local config state running out of sync causing
> network issues duplicate multiple IPs.

The same is true for permanent reservations, wherever that reservation is
saved needs to be in sync with IPAM, e.g., also on backup restore (into a
new env), if subnets change their configured CIDRs, ...

> 
> Furthermore, every interaction with the IPAM requires a cluster-wide
> lock on the IPAM. Having a central cluster-wide lock on every VM
> start/stop/migrate will significantly limit parallel operations.  Event
> starting two VMs in parallel will be limited by this central lock. At
> boot trying to start many VMs (ideally as much in parallel as possible)
> is limited by the central IPAM lock even further.

Cluster wide locks are relatively cheap, especially if one avoids having
a long critical section, i.e., query IPAM while still unlocked, then 
read and update the state locked, if the newly received IP is already
in there then simply give up lock again and repeat.

We also have a clusters wide lock for starting HA guests, to set the
wanted ha-resource state, that is no issue at all, you can start/stop
many orders of magnitudes more VMs than any HW/Storage could cope with.

> 
> I argue that we shall not support ephemeral IPs altogether.
> The alternative is to make all IPAM reservations persistent.


> 
> Using persistent IPs only reduces the interactions of VM/CTs with the
> IPAM to a minimum of NIC joining a subnet and NIC leaving a subnet. I am
> deliberately not referring to VMs because a VM may be part of multiple
> VNets or even multiple times in the same VNet (regardless if that is
> sensible).

Yeah, talking about vNICs / veth's is the better term here, guests are
only indirectly relevant.

> 
> Cases the IPAM needs to be involved:
> 
> - NIC with DHCP enabled VNet is added to VM config
> - NIC with DHCP enabled VNet is removed from VM config
> - NIC is assigned to another Bridge
>   can be treated as individual leave + join events

and:

- subnet config is changed
- vNIC changes from SDN-DHCP managed to manual, or vice versa
  Albeit that can almost be treated like vNet leave/join though

 
> Cases that are explicitly not covered but may be added if desired:
> 
> - Manually assign an IP address on a NIC
>   will not be automatically visible in the IPAM

This sounds like you want to save the state in the VM config, which I'm
rather skeptical about, and would try hard to avoid. We also would need
to differ between bridges that are part of DHCP-managed SDN and others,
as else a user could set some IP but nothing would happen.

> - Manually change the MAC on a NIC
>   don't do that > you are on your own.

FWIW, a clone is such a change, and we have to support that, otherwise
the MAC field needs to get some warning hints or even become read-only
in the UI.

>   Not handled > change in IPAM manually
> 
> Once an IP is reserved via IPAM, the dnsmasq config can be generated
> stateless and idempotent from the pve IPAM and is identical on all nodes
> regardless if a VM/CT actually resides on that node or is running or
> stopped.  This is especially useful for VM migration because the IP
> stays consistent without spacial considering.

That should be orthogonal to the feature set, if we have all the info
saved somewhere else

But this also speaks against having it in the VM config, as that would
mean that every node needs to parse every guests' config periodically,
which is way worse than some cluster lock and breaks with our base
axiom that guests are owned by their current node, and only by that,
and a node should not really alter behavior dependent on some "foreign"
guest.

> 
> Snapshot/revert, backup/restore, suspend/hibernate/resume cases are
> automatically covered because the IP will already be reserved for that
> MAC.

Not really, restore to another setup is broken, one could resume the
VM after having changed CIDRs of a subnet, making that broken too, ...

> 
> If the admin wants to change, the IP of a VM this can be done via the
> IPAM API/UI which will have to be implemented separately.

Providing Overrides can be fine, but IMO that all should be still in
the SDN state, not per-VM one, and ideally use a common API.


> A limitation of this approach vs dynamic IP reservation is that the IP
> range on the subnet needs to be large enough to hold all IPs of all,
> even stopped, VMs in that subnet. This is in contrast to default DHCP
> functionality where only the number of actively running VMs is limited.
> It should be enough to mention this in the docs.

In production setups it should not matter _that_ much, but it might
be a bit of a PITA if one has a few "archived" VMs or the like, but
that alone would

> 
> I will further review the code an try to implement the aforementioned
> approach.

You can naturally experiment, but I'd also try the upsert proposal from
Stefan H., as IMO that sounds like a good balance.




  reply	other threads:[~2023-10-27  7:39 UTC|newest]

Thread overview: 54+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2023-10-17 13:54 Stefan Hanreich
2023-10-17 13:54 ` [pve-devel] [WIP v2 pve-cluster 01/10] cluster files: add dhcp.cfg Stefan Hanreich
2023-10-17 13:54 ` [pve-devel] [WIP v2 pve-network 02/10] subnets: vnets: preparations for DHCP plugins Stefan Hanreich
2023-10-17 13:55 ` [pve-devel] [WIP v2 pve-network 03/10] dhcp: add abstract class " Stefan Hanreich
2023-10-17 13:55 ` [pve-devel] [WIP v2 pve-network 04/10] dhcp: subnet: add DHCP options to subnet configuration Stefan Hanreich
2023-10-17 13:55 ` [pve-devel] [WIP v2 pve-network 05/10] dhcp: add DHCP plugin for dnsmasq Stefan Hanreich
2023-10-18 10:13   ` DERUMIER, Alexandre
2023-11-08 17:18   ` DERUMIER, Alexandre
2023-11-09  8:45     ` Stefan Hanreich
2023-10-17 13:55 ` [pve-devel] [WIP v2 pve-network 06/10] ipam: Add helper methods for DHCP to PVE IPAM Stefan Hanreich
2023-10-27 11:51   ` Stefan Lendl
2023-10-17 13:55 ` [pve-devel] [WIP v2 pve-network 07/10] dhcp: regenerate config for DHCP servers on reload Stefan Hanreich
2023-10-17 13:55 ` [pve-devel] [WIP v2 pve-manager 08/10] sdn: regenerate DHCP config " Stefan Hanreich
2023-10-17 13:55 ` [pve-devel] [WIP v2 qemu-server 09/10] sdn: dhcp: add DHCP setup to vm-network-scripts Stefan Hanreich
2023-10-17 13:55 ` [pve-devel] [WIP v2 pve-container 10/10] sdn: dhcp: setup DHCP mappings in LXC hooks Stefan Hanreich
2023-10-17 14:48 ` [pve-devel] [WIP v2 cluster/network/manager/qemu-server/container 00/10] Add support for DHCP servers to SDN DERUMIER, Alexandre
2023-10-17 16:05   ` Stefan Hanreich
2023-10-17 21:00     ` DERUMIER, Alexandre
2023-10-17 16:04 ` Stefan Hanreich
2023-10-18  9:59   ` DERUMIER, Alexandre
2023-10-23 12:40 ` Stefan Lendl
2023-10-27  7:39   ` Thomas Lamprecht [this message]
2023-10-27 12:26     ` Stefan Lendl
2023-10-27 12:36     ` DERUMIER, Alexandre
2023-10-27 11:19   ` [pve-devel] [RFC SDN DHCP] Add and Remove DHCP mappings on vNIC add/remove Stefan Lendl
2023-10-27 11:20   ` Stefan Lendl
2023-10-27 11:20     ` [pve-devel] [RFC pve-network 1/3] dhcp add ip returns IP if already present for MAC Stefan Lendl
2023-10-27 11:20     ` [pve-devel] [RFC pve-network 2/3] always generate dnsmasq ethers file Stefan Lendl
2023-10-27 11:20     ` [pve-devel] [RFC pve-network 3/3] touch the ethers file when creating the dnsmasq config Stefan Lendl
2023-10-27 11:20     ` [pve-devel] [RFC pve-network] do not remove DHCP mapping on stop Stefan Lendl
2023-11-08 14:32       ` DERUMIER, Alexandre
2023-11-08 14:38         ` Stefan Hanreich
2023-11-08 15:41           ` DERUMIER, Alexandre
2023-10-27 11:20     ` [pve-devel] [RFC pve-network 4/5] do not remove DHCP mapping on VM stop Stefan Lendl
2023-10-27 11:20     ` [pve-devel] [RFC pve-network 5/5] DHCP mappings on vNIC add/remove Stefan Lendl
2023-10-27 11:29   ` [pve-devel] [RFC SDN DHCP] Add and Remove " Stefan Lendl
2023-10-27 11:29     ` [pve-devel] [RFC pve-network 1/6] dhcp add ip returns IP if already present for MAC Stefan Lendl
2023-10-27 11:29     ` [pve-devel] [RFC pve-network 2/6] always generate dnsmasq ethers file Stefan Lendl
2023-11-08 16:44       ` DERUMIER, Alexandre
2023-10-27 11:29     ` [pve-devel] [RFC pve-network 3/6] touch the ethers file when creating the dnsmasq config Stefan Lendl
2023-10-27 11:29     ` [pve-devel] [RFC pve-container 4/6] do not remove DHCP mapping on stop Stefan Lendl
2023-10-27 11:29     ` [pve-devel] [RFC qemu-server 5/6] do not remove DHCP mapping on VM stop Stefan Lendl
2023-10-27 11:30     ` [pve-devel] [RFC qemu-server 6/6] DHCP mappings on vNIC add/remove Stefan Lendl
2023-11-08 16:46       ` DERUMIER, Alexandre
2023-10-27 11:52     ` [pve-devel] [RFC SDN DHCP] Add and Remove " Thomas Lamprecht
2023-10-27 11:54       ` Stefan Lendl
2023-10-27 11:59         ` Thomas Lamprecht
2023-10-27 11:57       ` Thomas Lamprecht
2023-10-27 12:53   ` [pve-devel] [WIP v2 cluster/network/manager/qemu-server/container 00/10] Add support for DHCP servers to SDN Stefan Lendl
2023-10-27 13:37     ` DERUMIER, Alexandre
2023-10-23 10:27 Stefan Lendl
2023-10-23 12:52 ` Stefan Lendl
2023-10-26 12:49 ` DERUMIER, Alexandre
2023-10-26 12:53 ` DERUMIER, Alexandre

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=330b6d23-6a0f-4041-9892-26944fb7e30d@proxmox.com \
    --to=t.lamprecht@proxmox.com \
    --cc=pve-devel@lists.proxmox.com \
    --cc=s.lendl@proxmox.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.
Service provided by Proxmox Server Solutions GmbH | Privacy | Legal