public inbox for pve-devel@lists.proxmox.com
 help / color / mirror / Atom feed
From: Stefan Lendl <s.lendl@proxmox.com>
To: Thomas Lamprecht <t.lamprecht@proxmox.com>,
	Proxmox VE development discussion <pve-devel@lists.proxmox.com>
Subject: Re: [pve-devel] [WIP v2 cluster/network/manager/qemu-server/container 00/10] Add support for DHCP servers to SDN
Date: Fri, 27 Oct 2023 14:26:02 +0200	[thread overview]
Message-ID: <87edhgmeud.fsf@gmail.com> (raw)
In-Reply-To: <330b6d23-6a0f-4041-9892-26944fb7e30d@proxmox.com>

Thomas Lamprecht <t.lamprecht@proxmox.com> writes:

> Am 23/10/2023 um 14:40 schrieb Stefan Lendl:
>> I am currently working on the SDN feature.  This is an initial review of
>> the patch series and I am trying to make a strong case against ephemeral
>> DHCP IP reservation.
>
> Stefan Hanreich's reply to the cover letter already mentions upserts, those
> will avoid basically all problems while allowing for some dynamic changes.
>

I totally agree with upserts and my patches add this functionality.

>> The current state of the patch series invokes the IPAM on every VM/CT
>> start/stop to add or remove the IP from the IPAM.
>> This triggers the dnsmasq config generation on the specific host with
>> only the MAC/IP mapping of that particular host.
>>
>> From reading the discussion of the v1 patch series I understand this
>> approach tries to implement the ephemeral IP reservation strategy. From
>> off-list conversations with Stefan Hanreich, I agree that having
>> ephemeral IP reservation coordinated by the IPAM requires us to
>> re-implement DHCP functionality in the IPAM and heavily rely on syncing
>> between the different services.
>>
>> To maintain reliable sync we need to hook into many different places
>> where the IPAM need to be queried.  Any issues with the implementation
>> may lead to IPAM and DHCP local config state running out of sync causing
>> network issues duplicate multiple IPs.
>
> The same is true for permanent reservations, wherever that reservation is
> saved needs to be in sync with IPAM, e.g., also on backup restore (into a
> new env), if subnets change their configured CIDRs, ...
>

Yes, agreed but it's arguably less states and situation that need to be
synced.

The current implementation had a different state per node and depended
on the online/offline state of the guest.

It is currently not allowed to change the CIDR of a subnet.

>>
>> Furthermore, every interaction with the IPAM requires a cluster-wide
>> lock on the IPAM. Having a central cluster-wide lock on every VM
>> start/stop/migrate will significantly limit parallel operations.  Event
>> starting two VMs in parallel will be limited by this central lock. At
>> boot trying to start many VMs (ideally as much in parallel as possible)
>> is limited by the central IPAM lock even further.
>
> Cluster wide locks are relatively cheap, especially if one avoids having
> a long critical section, i.e., query IPAM while still unlocked, then
> read and update the state locked, if the newly received IP is already
> in there then simply give up lock again and repeat.
>
> We also have a clusters wide lock for starting HA guests, to set the
> wanted ha-resource state, that is no issue at all, you can start/stop
> many orders of magnitudes more VMs than any HW/Storage could cope with.
>
>>
>> I argue that we shall not support ephemeral IPs altogether.
>> The alternative is to make all IPAM reservations persistent.
>
>
>>
>> Using persistent IPs only reduces the interactions of VM/CTs with the
>> IPAM to a minimum of NIC joining a subnet and NIC leaving a subnet. I am
>> deliberately not referring to VMs because a VM may be part of multiple
>> VNets or even multiple times in the same VNet (regardless if that is
>> sensible).
>
> Yeah, talking about vNICs / veth's is the better term here, guests are
> only indirectly relevant.
>
>>
>> Cases the IPAM needs to be involved:
>>
>> - NIC with DHCP enabled VNet is added to VM config
>> - NIC with DHCP enabled VNet is removed from VM config
>> - NIC is assigned to another Bridge
>>   can be treated as individual leave + join events
>
> and:
>
> - subnet config is changed
> - vNIC changes from SDN-DHCP managed to manual, or vice versa
>   Albeit that can almost be treated like vNet leave/join though
>
>
>> Cases that are explicitly not covered but may be added if desired:
>>
>> - Manually assign an IP address on a NIC
>>   will not be automatically visible in the IPAM
>
> This sounds like you want to save the state in the VM config, which I'm
> rather skeptical about, and would try hard to avoid. We also would need
> to differ between bridges that are part of DHCP-managed SDN and others,
> as else a user could set some IP but nothing would happen.
>

I am sorry, my explanation was not clear here. I do not want to store IP
inside the VM config.  I agree that this would not be ideal.  If a user
configures an IP from inside the VM, we have no way of tracking that IP.

For now, every added vNIC gets an IP from the IPAM, and if the guest is
configured to use DHCP, it will get this IP from the DHCP server.

If the user decides to manually configure the IP, he will have to
reserve it in the IPAM, and mark the IP as "manual".
This will prevent the IPAM from allocating the IP again and keep the
IP/MAC mapping even if the VM is destroyed.

This is not implemented yet, but sketched out with Mira off-list.

>> - Manually change the MAC on a NIC
>>   don't do that > you are on your own.
>
> FWIW, a clone is such a change, and we have to support that, otherwise
> the MAC field needs to get some warning hints or even become read-only
> in the UI.
>
>>   Not handled > change in IPAM manually
>>
>> Once an IP is reserved via IPAM, the dnsmasq config can be generated
>> stateless and idempotent from the pve IPAM and is identical on all nodes
>> regardless if a VM/CT actually resides on that node or is running or
>> stopped.  This is especially useful for VM migration because the IP
>> stays consistent without spacial considering.
>
> That should be orthogonal to the feature set, if we have all the info
> saved somewhere else
>
> But this also speaks against having it in the VM config, as that would
> mean that every node needs to parse every guests' config periodically,
> which is way worse than some cluster lock and breaks with our base
> axiom that guests are owned by their current node, and only by that,
> and a node should not really alter behavior dependent on some "foreign"
> guest.
>
>>
>> Snapshot/revert, backup/restore, suspend/hibernate/resume cases are
>> automatically covered because the IP will already be reserved for that
>> MAC.
>
> Not really, restore to another setup is broken, one could resume the
> VM after having changed CIDRs of a subnet, making that broken too, ...
>
>>
>> If the admin wants to change, the IP of a VM this can be done via the
>> IPAM API/UI which will have to be implemented separately.
>
> Providing Overrides can be fine, but IMO that all should be still in
> the SDN state, not per-VM one, and ideally use a common API.
>
>
>> A limitation of this approach vs dynamic IP reservation is that the IP
>> range on the subnet needs to be large enough to hold all IPs of all,
>> even stopped, VMs in that subnet. This is in contrast to default DHCP
>> functionality where only the number of actively running VMs is limited.
>> It should be enough to mention this in the docs.
>
> In production setups it should not matter _that_ much, but it might
> be a bit of a PITA if one has a few "archived" VMs or the like, but
> that alone would
>
>>
>> I will further review the code an try to implement the aforementioned
>> approach.
>
> You can naturally experiment, but I'd also try the upsert proposal from
> Stefan H., as IMO that sounds like a good balance.




  reply	other threads:[~2023-10-27 12:26 UTC|newest]

Thread overview: 54+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2023-10-17 13:54 Stefan Hanreich
2023-10-17 13:54 ` [pve-devel] [WIP v2 pve-cluster 01/10] cluster files: add dhcp.cfg Stefan Hanreich
2023-10-17 13:54 ` [pve-devel] [WIP v2 pve-network 02/10] subnets: vnets: preparations for DHCP plugins Stefan Hanreich
2023-10-17 13:55 ` [pve-devel] [WIP v2 pve-network 03/10] dhcp: add abstract class " Stefan Hanreich
2023-10-17 13:55 ` [pve-devel] [WIP v2 pve-network 04/10] dhcp: subnet: add DHCP options to subnet configuration Stefan Hanreich
2023-10-17 13:55 ` [pve-devel] [WIP v2 pve-network 05/10] dhcp: add DHCP plugin for dnsmasq Stefan Hanreich
2023-10-18 10:13   ` DERUMIER, Alexandre
2023-11-08 17:18   ` DERUMIER, Alexandre
2023-11-09  8:45     ` Stefan Hanreich
2023-10-17 13:55 ` [pve-devel] [WIP v2 pve-network 06/10] ipam: Add helper methods for DHCP to PVE IPAM Stefan Hanreich
2023-10-27 11:51   ` Stefan Lendl
2023-10-17 13:55 ` [pve-devel] [WIP v2 pve-network 07/10] dhcp: regenerate config for DHCP servers on reload Stefan Hanreich
2023-10-17 13:55 ` [pve-devel] [WIP v2 pve-manager 08/10] sdn: regenerate DHCP config " Stefan Hanreich
2023-10-17 13:55 ` [pve-devel] [WIP v2 qemu-server 09/10] sdn: dhcp: add DHCP setup to vm-network-scripts Stefan Hanreich
2023-10-17 13:55 ` [pve-devel] [WIP v2 pve-container 10/10] sdn: dhcp: setup DHCP mappings in LXC hooks Stefan Hanreich
2023-10-17 14:48 ` [pve-devel] [WIP v2 cluster/network/manager/qemu-server/container 00/10] Add support for DHCP servers to SDN DERUMIER, Alexandre
2023-10-17 16:05   ` Stefan Hanreich
2023-10-17 21:00     ` DERUMIER, Alexandre
2023-10-17 16:04 ` Stefan Hanreich
2023-10-18  9:59   ` DERUMIER, Alexandre
2023-10-23 12:40 ` Stefan Lendl
2023-10-27  7:39   ` Thomas Lamprecht
2023-10-27 12:26     ` Stefan Lendl [this message]
2023-10-27 12:36     ` DERUMIER, Alexandre
2023-10-27 11:19   ` [pve-devel] [RFC SDN DHCP] Add and Remove DHCP mappings on vNIC add/remove Stefan Lendl
2023-10-27 11:20   ` Stefan Lendl
2023-10-27 11:20     ` [pve-devel] [RFC pve-network 1/3] dhcp add ip returns IP if already present for MAC Stefan Lendl
2023-10-27 11:20     ` [pve-devel] [RFC pve-network 2/3] always generate dnsmasq ethers file Stefan Lendl
2023-10-27 11:20     ` [pve-devel] [RFC pve-network 3/3] touch the ethers file when creating the dnsmasq config Stefan Lendl
2023-10-27 11:20     ` [pve-devel] [RFC pve-network] do not remove DHCP mapping on stop Stefan Lendl
2023-11-08 14:32       ` DERUMIER, Alexandre
2023-11-08 14:38         ` Stefan Hanreich
2023-11-08 15:41           ` DERUMIER, Alexandre
2023-10-27 11:20     ` [pve-devel] [RFC pve-network 4/5] do not remove DHCP mapping on VM stop Stefan Lendl
2023-10-27 11:20     ` [pve-devel] [RFC pve-network 5/5] DHCP mappings on vNIC add/remove Stefan Lendl
2023-10-27 11:29   ` [pve-devel] [RFC SDN DHCP] Add and Remove " Stefan Lendl
2023-10-27 11:29     ` [pve-devel] [RFC pve-network 1/6] dhcp add ip returns IP if already present for MAC Stefan Lendl
2023-10-27 11:29     ` [pve-devel] [RFC pve-network 2/6] always generate dnsmasq ethers file Stefan Lendl
2023-11-08 16:44       ` DERUMIER, Alexandre
2023-10-27 11:29     ` [pve-devel] [RFC pve-network 3/6] touch the ethers file when creating the dnsmasq config Stefan Lendl
2023-10-27 11:29     ` [pve-devel] [RFC pve-container 4/6] do not remove DHCP mapping on stop Stefan Lendl
2023-10-27 11:29     ` [pve-devel] [RFC qemu-server 5/6] do not remove DHCP mapping on VM stop Stefan Lendl
2023-10-27 11:30     ` [pve-devel] [RFC qemu-server 6/6] DHCP mappings on vNIC add/remove Stefan Lendl
2023-11-08 16:46       ` DERUMIER, Alexandre
2023-10-27 11:52     ` [pve-devel] [RFC SDN DHCP] Add and Remove " Thomas Lamprecht
2023-10-27 11:54       ` Stefan Lendl
2023-10-27 11:59         ` Thomas Lamprecht
2023-10-27 11:57       ` Thomas Lamprecht
2023-10-27 12:53   ` [pve-devel] [WIP v2 cluster/network/manager/qemu-server/container 00/10] Add support for DHCP servers to SDN Stefan Lendl
2023-10-27 13:37     ` DERUMIER, Alexandre
2023-10-23 10:27 Stefan Lendl
2023-10-23 12:52 ` Stefan Lendl
2023-10-26 12:49 ` DERUMIER, Alexandre
2023-10-26 12:53 ` DERUMIER, Alexandre

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=87edhgmeud.fsf@gmail.com \
    --to=s.lendl@proxmox.com \
    --cc=pve-devel@lists.proxmox.com \
    --cc=t.lamprecht@proxmox.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox
Service provided by Proxmox Server Solutions GmbH | Privacy | Legal