From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: <t.lamprecht@proxmox.com> Received: from firstgate.proxmox.com (firstgate.proxmox.com [212.224.123.68]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature RSA-PSS (2048 bits)) (No client certificate requested) by lists.proxmox.com (Postfix) with ESMTPS id AE2A89D7CD for <pve-devel@lists.proxmox.com>; Fri, 27 Oct 2023 09:39:38 +0200 (CEST) Received: from firstgate.proxmox.com (localhost [127.0.0.1]) by firstgate.proxmox.com (Proxmox) with ESMTP id 88DEB3622F for <pve-devel@lists.proxmox.com>; Fri, 27 Oct 2023 09:39:08 +0200 (CEST) Received: from proxmox-new.maurer-it.com (proxmox-new.maurer-it.com [94.136.29.106]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature RSA-PSS (2048 bits)) (No client certificate requested) by firstgate.proxmox.com (Proxmox) with ESMTPS for <pve-devel@lists.proxmox.com>; Fri, 27 Oct 2023 09:39:07 +0200 (CEST) Received: from proxmox-new.maurer-it.com (localhost.localdomain [127.0.0.1]) by proxmox-new.maurer-it.com (Proxmox) with ESMTP id 8A31741B51; Fri, 27 Oct 2023 09:39:07 +0200 (CEST) Message-ID: <330b6d23-6a0f-4041-9892-26944fb7e30d@proxmox.com> Date: Fri, 27 Oct 2023 09:39:06 +0200 MIME-Version: 1.0 User-Agent: Mozilla Thunderbird Content-Language: en-GB, de-AT To: Proxmox VE development discussion <pve-devel@lists.proxmox.com>, Stefan Lendl <s.lendl@proxmox.com> References: <20231017135507.2220948-1-s.hanreich@proxmox.com> <87v8axbjh1.fsf@gmail.com> From: Thomas Lamprecht <t.lamprecht@proxmox.com> Autocrypt: addr=t.lamprecht@proxmox.com; keydata= xsFNBFsLjcYBEACsaQP6uTtw/xHTUCKF4VD4/Wfg7gGn47+OfCKJQAD+Oyb3HSBkjclopC5J uXsB1vVOfqVYE6PO8FlD2L5nxgT3SWkc6Ka634G/yGDU3ZC3C/7NcDVKhSBI5E0ww4Qj8s9w OQRloemb5LOBkJNEUshkWRTHHOmk6QqFB/qBPW2COpAx6oyxVUvBCgm/1S0dAZ9gfkvpqFSD 90B5j3bL6i9FIv3YGUCgz6Ue3f7u+HsEAew6TMtlt90XV3vT4M2IOuECG/pXwTy7NtmHaBQ7 UJBcwSOpDEweNob50+9B4KbnVn1ydx+K6UnEcGDvUWBkREccvuExvupYYYQ5dIhRFf3fkS4+ wMlyAFh8PQUgauod+vqs45FJaSgTqIALSBsEHKEs6IoTXtnnpbhu3p6XBin4hunwoBFiyYt6 YHLAM1yLfCyX510DFzX/Ze2hLqatqzY5Wa7NIXqYYelz7tXiuCLHP84+sV6JtEkeSUCuOiUY virj6nT/nJK8m0BzdR6FgGtNxp7RVXFRz/+mwijJVLpFsyG1i0Hmv2zTn3h2nyGK/I6yhFNt dX69y5hbo6LAsRjLUvZeHXpTU4TrpN/WiCjJblbj5um5eEr4yhcwhVmG102puTtuCECsDucZ jpKpUqzXlpLbzG/dp9dXFH3MivvfuaHrg3MtjXY1i+/Oxyp5iwARAQABzTNUaG9tYXMgTGFt cHJlY2h0IChBdXRoLTQpIDx0LmxhbXByZWNodEBwcm94bW94LmNvbT7CwY4EEwEIADgWIQQO R4qbEl/pah9K6VrTZCM6gDZWBgUCWwuNxgIbAwULCQgHAgYVCAkKCwIEFgIDAQIeAQIXgAAK CRDTZCM6gDZWBm/jD/4+6JB2s67eaqoP6x9VGaXNGJPCscwzLuxDTCG90G9FYu29VcXtubH/ bPwsyBbNUQpqTm/s4XboU2qpS5ykCuTjqavrcP33tdkYfGcItj2xMipJ1i3TWvpikQVsX42R G64wovLs/dvpTYphRZkg5DwhgTmy3mRkmofFCTa+//MOcNOORltemp984tWjpR3bUJETNWpF sKGZHa3N4kCNxb7A+VMsJZ/1gN3jbQbQG7GkJtnHlWkw9rKCYqBtWrnrHa4UAvSa9M/XCIAB FThFGqZI1ojdVlv5gd6b/nWxfOPrLlSxbUo5FZ1i/ycj7/24nznW1V4ykG9iUld4uYUY86bB UGSjew1KYp9FmvKiwEoB+zxNnuEQfS7/Bj1X9nxizgweiHIyFsRqgogTvLh403QMSGNSoArk tqkorf1U+VhEncIn4H3KksJF0njZKfilrieOO7Vuot1xKr9QnYrZzJ7m7ZxJ/JfKGaRHXkE1 feMmrvZD1AtdUATZkoeQtTOpMu4r6IQRfSdwm/CkppZXfDe50DJxAMDWwfK2rr2bVkNg/yZI tKLBS0YgRTIynkvv0h8d9dIjiicw3RMeYXyqOnSWVva2r+tl+JBaenr8YTQw0zARrhC0mttu cIZGnVEvQuDwib57QLqMjQaC1gazKHvhA15H5MNxUhwm229UmdH3KM7BTQRbC43GARAAyTkR D6KRJ9Xa2fVMh+6f186q0M3ni+5tsaVhUiykxjsPgkuWXWW9MbLpYXkzX6h/RIEKlo2BGA95 QwG5+Ya2Bo3g7FGJHAkXY6loq7DgMp5/TVQ8phsSv3WxPTJLCBq6vNBamp5hda4cfXFUymsy HsJy4dtgkrPQ/bnsdFDCRUuhJHopnAzKHN8APXpKU6xV5e3GE4LwFsDhNHfH/m9+2yO/trcD txSFpyftbK2gaMERHgA8SKkzRhiwRTt9w5idOfpJVkYRsgvuSGZ0pcD4kLCOIFrer5xXudk6 NgJc36XkFRMnwqrL/bB4k6Pi2u5leyqcXSLyBgeHsZJxg6Lcr2LZ35+8RQGPOw9C0ItmRjtY ZpGKPlSxjxA1WHT2YlF9CEt3nx7c4C3thHHtqBra6BGPyW8rvtq4zRqZRLPmZ0kt/kiMPhTM 8wZAlObbATVrUMcZ/uNjRv2vU9O5aTAD9E5r1B0dlqKgxyoImUWB0JgpILADaT3VybDd3C8X s6Jt8MytUP+1cEWt9VKo4vY4Jh5vwrJUDLJvzpN+TsYCZPNVj18+jf9uGRaoK6W++DdMAr5l gQiwsNgf9372dbMI7pt2gnT5/YdG+ZHnIIlXC6OUonA1Ro/Itg90Q7iQySnKKkqqnWVc+qO9 GJbzcGykxD6EQtCSlurt3/5IXTA7t6sAEQEAAcLBdgQYAQgAIBYhBA5HipsSX+lqH0rpWtNk IzqANlYGBQJbC43GAhsMAAoJENNkIzqANlYGD1sP/ikKgHgcspEKqDED9gQrTBvipH85si0j /Jwu/tBtnYjLgKLh2cjv1JkgYYjb3DyZa1pLsIv6rGnPX9bH9IN03nqirC/Q1Y1lnbNTynPk IflgvsJjoTNZjgu1wUdQlBgL/JhUp1sIYID11jZphgzfDgp/E6ve/8xE2HMAnf4zAfJaKgD0 F+fL1DlcdYUditAiYEuN40Ns/abKs8I1MYx7Yglu3RzJfBzV4t86DAR+OvuF9v188WrFwXCS RSf4DmJ8tntyNej+DVGUnmKHupLQJO7uqCKB/1HLlMKc5G3GLoGqJliHjUHUAXNzinlpE2Vj C78pxpwxRNg2ilE3AhPoAXrY5qED5PLE9sLnmQ9AzRcMMJUXjTNEDxEYbF55SdGBHHOAcZtA kEQKub86e+GHA+Z8oXQSGeSGOkqHi7zfgW1UexddTvaRwE6AyZ6FxTApm8wq8NT2cryWPWTF BDSGB3ujWHMM8ERRYJPcBSjTvt0GcEqnd+OSGgxTkGOdufn51oz82zfpVo1t+J/FNz6MRMcg 8nEC+uKvgzH1nujxJ5pRCBOquFZaGn/p71Yr0oVitkttLKblFsqwa+10Lt6HBxm+2+VLp4Ja 0WZNncZciz3V3cuArpan/ZhhyiWYV5FD0pOXPCJIx7WS9PTtxiv0AOS4ScWEUmBxyhFeOpYa DrEx In-Reply-To: <87v8axbjh1.fsf@gmail.com> Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 7bit X-SPAM-LEVEL: Spam detection results: 0 AWL -0.075 Adjusted score from AWL reputation of From: address BAYES_00 -1.9 Bayes spam probability is 0 to 1% DMARC_MISSING 0.1 Missing DMARC policy KAM_DMARC_STATUS 0.01 Test Rule for DKIM or SPF Failure with Strict Alignment SPF_HELO_NONE 0.001 SPF: HELO does not publish an SPF Record SPF_PASS -0.001 SPF: sender matches SPF record Subject: Re: [pve-devel] [WIP v2 cluster/network/manager/qemu-server/container 00/10] Add support for DHCP servers to SDN X-BeenThere: pve-devel@lists.proxmox.com X-Mailman-Version: 2.1.29 Precedence: list List-Id: Proxmox VE development discussion <pve-devel.lists.proxmox.com> List-Unsubscribe: <https://lists.proxmox.com/cgi-bin/mailman/options/pve-devel>, <mailto:pve-devel-request@lists.proxmox.com?subject=unsubscribe> List-Archive: <http://lists.proxmox.com/pipermail/pve-devel/> List-Post: <mailto:pve-devel@lists.proxmox.com> List-Help: <mailto:pve-devel-request@lists.proxmox.com?subject=help> List-Subscribe: <https://lists.proxmox.com/cgi-bin/mailman/listinfo/pve-devel>, <mailto:pve-devel-request@lists.proxmox.com?subject=subscribe> X-List-Received-Date: Fri, 27 Oct 2023 07:39:38 -0000 Am 23/10/2023 um 14:40 schrieb Stefan Lendl: > I am currently working on the SDN feature. This is an initial review of > the patch series and I am trying to make a strong case against ephemeral > DHCP IP reservation. Stefan Hanreich's reply to the cover letter already mentions upserts, those will avoid basically all problems while allowing for some dynamic changes. > The current state of the patch series invokes the IPAM on every VM/CT > start/stop to add or remove the IP from the IPAM. > This triggers the dnsmasq config generation on the specific host with > only the MAC/IP mapping of that particular host. > > From reading the discussion of the v1 patch series I understand this > approach tries to implement the ephemeral IP reservation strategy. From > off-list conversations with Stefan Hanreich, I agree that having > ephemeral IP reservation coordinated by the IPAM requires us to > re-implement DHCP functionality in the IPAM and heavily rely on syncing > between the different services. > > To maintain reliable sync we need to hook into many different places > where the IPAM need to be queried. Any issues with the implementation > may lead to IPAM and DHCP local config state running out of sync causing > network issues duplicate multiple IPs. The same is true for permanent reservations, wherever that reservation is saved needs to be in sync with IPAM, e.g., also on backup restore (into a new env), if subnets change their configured CIDRs, ... > > Furthermore, every interaction with the IPAM requires a cluster-wide > lock on the IPAM. Having a central cluster-wide lock on every VM > start/stop/migrate will significantly limit parallel operations. Event > starting two VMs in parallel will be limited by this central lock. At > boot trying to start many VMs (ideally as much in parallel as possible) > is limited by the central IPAM lock even further. Cluster wide locks are relatively cheap, especially if one avoids having a long critical section, i.e., query IPAM while still unlocked, then read and update the state locked, if the newly received IP is already in there then simply give up lock again and repeat. We also have a clusters wide lock for starting HA guests, to set the wanted ha-resource state, that is no issue at all, you can start/stop many orders of magnitudes more VMs than any HW/Storage could cope with. > > I argue that we shall not support ephemeral IPs altogether. > The alternative is to make all IPAM reservations persistent. > > Using persistent IPs only reduces the interactions of VM/CTs with the > IPAM to a minimum of NIC joining a subnet and NIC leaving a subnet. I am > deliberately not referring to VMs because a VM may be part of multiple > VNets or even multiple times in the same VNet (regardless if that is > sensible). Yeah, talking about vNICs / veth's is the better term here, guests are only indirectly relevant. > > Cases the IPAM needs to be involved: > > - NIC with DHCP enabled VNet is added to VM config > - NIC with DHCP enabled VNet is removed from VM config > - NIC is assigned to another Bridge > can be treated as individual leave + join events and: - subnet config is changed - vNIC changes from SDN-DHCP managed to manual, or vice versa Albeit that can almost be treated like vNet leave/join though > Cases that are explicitly not covered but may be added if desired: > > - Manually assign an IP address on a NIC > will not be automatically visible in the IPAM This sounds like you want to save the state in the VM config, which I'm rather skeptical about, and would try hard to avoid. We also would need to differ between bridges that are part of DHCP-managed SDN and others, as else a user could set some IP but nothing would happen. > - Manually change the MAC on a NIC > don't do that > you are on your own. FWIW, a clone is such a change, and we have to support that, otherwise the MAC field needs to get some warning hints or even become read-only in the UI. > Not handled > change in IPAM manually > > Once an IP is reserved via IPAM, the dnsmasq config can be generated > stateless and idempotent from the pve IPAM and is identical on all nodes > regardless if a VM/CT actually resides on that node or is running or > stopped. This is especially useful for VM migration because the IP > stays consistent without spacial considering. That should be orthogonal to the feature set, if we have all the info saved somewhere else But this also speaks against having it in the VM config, as that would mean that every node needs to parse every guests' config periodically, which is way worse than some cluster lock and breaks with our base axiom that guests are owned by their current node, and only by that, and a node should not really alter behavior dependent on some "foreign" guest. > > Snapshot/revert, backup/restore, suspend/hibernate/resume cases are > automatically covered because the IP will already be reserved for that > MAC. Not really, restore to another setup is broken, one could resume the VM after having changed CIDRs of a subnet, making that broken too, ... > > If the admin wants to change, the IP of a VM this can be done via the > IPAM API/UI which will have to be implemented separately. Providing Overrides can be fine, but IMO that all should be still in the SDN state, not per-VM one, and ideally use a common API. > A limitation of this approach vs dynamic IP reservation is that the IP > range on the subnet needs to be large enough to hold all IPs of all, > even stopped, VMs in that subnet. This is in contrast to default DHCP > functionality where only the number of actively running VMs is limited. > It should be enough to mention this in the docs. In production setups it should not matter _that_ much, but it might be a bit of a PITA if one has a few "archived" VMs or the like, but that alone would > > I will further review the code an try to implement the aforementioned > approach. You can naturally experiment, but I'd also try the upsert proposal from Stefan H., as IMO that sounds like a good balance.