From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from firstgate.proxmox.com (firstgate.proxmox.com [IPv6:2a01:7e0:0:424::9]) by lore.proxmox.com (Postfix) with ESMTPS id 0B3E41FF13E for ; Fri, 06 Feb 2026 12:22:39 +0100 (CET) Received: from firstgate.proxmox.com (localhost [127.0.0.1]) by firstgate.proxmox.com (Proxmox) with ESMTP id 7438625938; Fri, 6 Feb 2026 12:23:10 +0100 (CET) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=aetherus.de; s=default; t=1770376950; bh=s3qcVjjZ9b8PRBDARmdnNowrpFSuu8s1z9OfCH2Z4tQ=; h=Subject:To:From; b=aP6qO+Hs7dOLAK3Hf2uCYjxgmsC7GfTYUrBXTUXpxOKZ/5mPKyZddgULFVmwSoc8C FuZFAnGxOtzGWADFhJt1KiHNbCZHHvNVaNh2Q1b9jLLBVQQrS/pJYP/ae4ZuDmYcSz F4szOs2f5b2nHSr8O9GopErnzZiKRmzw/2smN0Klr/SnnEQc0hgDCJudPzLl8ONB88 oT2KFyGkpsP4gJaHsrDO6RxCFICILkKVjNfRLYUGjWOgJnaDw8aFHhVO1NFOBulMmC BNMxrvI+I5Mzu65pVITHwilhHGDdk5ohVqUw53sRBvBugXYyoGSFFx9Am2z+Yd6RSd qdT+krErrBngg== Authentication-Results: plesk01; spf=pass (sender IP is 195.5.114.21) smtp.mailfrom=klein@aetherus.de smtp.helo=[10.97.254.1] Received-SPF: pass (plesk01: connection is authenticated) Message-ID: Date: Fri, 6 Feb 2026 12:22:30 +0100 MIME-Version: 1.0 User-Agent: Mozilla Thunderbird Subject: Re: [pve-devel] [PATCH container 1/1] Signed-off-by: Maurice Klein To: Stefan Hanreich , pve-devel@lists.proxmox.com References: <20260109121049.70740-1-klein@aetherus.de> <20260109121049.70740-2-klein@aetherus.de> <021b748f-44db-4546-8399-c6f7312a11fc@proxmox.com> <77ad7294-e8b6-4862-8ce5-b81181d1188f@proxmox.com> <321bd4ff-f147-4329-9788-50061d569fa6@aetherus.de> <5d2bcf8a-ea6f-48aa-8cc6-c92cfb93311f@proxmox.com> <2a06be0f-4f4d-4c90-ab9b-4f7b062d6664@aetherus.de> Content-Language: en-US From: Maurice Klein In-Reply-To: Content-Type: text/plain; charset=UTF-8; format=flowed Content-Transfer-Encoding: 8bit X-PPP-Message-ID: <177037695077.3913436.12936926019397878816@plesk01.aetherus.io> X-PPP-Vhost: aetherus.de X-SPAM-LEVEL: Spam detection results: 0 AWL -0.001 Adjusted score from AWL reputation of From: address BAYES_00 -1.9 Bayes spam probability is 0 to 1% DKIM_SIGNED 0.1 Message has a DKIM or DK signature, not necessarily valid DKIM_VALID -0.1 Message has at least one valid DKIM or DK signature DKIM_VALID_AU -0.1 Message has a valid DKIM or DK signature from author's domain DKIM_VALID_EF -0.1 Message has a valid DKIM or DK signature from envelope-from domain DMARC_PASS -0.1 DMARC pass policy RCVD_IN_VALIDITY_CERTIFIED_BLOCKED 0.001 ADMINISTRATOR NOTICE: The query to Validity was blocked. See https://knowledge.validity.com/hc/en-us/articles/20961730681243 for more information. RCVD_IN_VALIDITY_RPBL_BLOCKED 0.001 ADMINISTRATOR NOTICE: The query to Validity was blocked. See https://knowledge.validity.com/hc/en-us/articles/20961730681243 for more information. RCVD_IN_VALIDITY_SAFE_BLOCKED 0.001 ADMINISTRATOR NOTICE: The query to Validity was blocked. See https://knowledge.validity.com/hc/en-us/articles/20961730681243 for more information. SPF_HELO_NONE 0.001 SPF: HELO does not publish an SPF Record SPF_PASS -0.001 SPF: sender matches SPF record Message-ID-Hash: KTO3XCBUJZ54OTIWOXIFRALMBU735OKU X-Message-ID-Hash: KTO3XCBUJZ54OTIWOXIFRALMBU735OKU X-MailFrom: klein@aetherus.de X-Mailman-Rule-Misses: dmarc-mitigation; no-senders; approved; loop; banned-address; emergency; member-moderation; nonmember-moderation; administrivia; implicit-dest; max-recipients; max-size; news-moderation; no-subject; digests; suspicious-header X-Mailman-Version: 3.3.10 Precedence: list List-Id: Proxmox VE development discussion List-Help: List-Owner: List-Post: List-Subscribe: List-Unsubscribe: Am 06.02.26 um 09:23 schrieb Stefan Hanreich: > On 2/1/26 3:31 PM, Maurice Klein wrote: >> Basicly the vnet and subnet part I see as in issue. >> Since in this kind of setup there is no defined subsets required the >> current configuration doesn't fully make sense. >> I guess you could still have a subnet configuration and configure all >> the host addresses inside that subnet, but it's not really necessary . >> Every VM route would be a /32 route and also the configured address on >> that bridge (gateway field) would be a /32. > We would still need a local IP on the PVE host that acts as a gateway > and preferably an IP for the VM inside the subnet so you can route the > traffic for the /32 IPs there. So we'd need to configure e.g. > 192.0.2.0/24 as subnet, then have the host as gateway (e.g. 192.0.2.1) > and each VM gets an IP inside that subnet (which could automatically be > handled via IPAM / DHCP). Looking at other implementations (e.g. > kube-router) there's even a whole subnet pool and each node gets one > subnet from that pool - but that's easier done with containers than VMs, > so I think the approach with one shared subnet seems easier > (particularly for VM mobility). I think I didn't explain properly about that. Basically the whole Idea is to have a gateway IP like 192.0.2.1/32 on the pve host on that bridge and not have a /24 or so route then. Guests then also have addresses whatever they might look like. For example a guest could have 1.1.1.1/32 but usually always /32, although I guess for some use cases it could be beneficial to be able to have a guest that gets more then a /32 but let's put that aside for now. Now there is no need/reason to define which subnet a guest is on and no need to be in the same with the host. The guest would configure it's ip statically inside and it would be a /32 usually. Now on the pve a host route to 1.1.1.1/32 would be added by the following comand: ip route add 1.1.1.1/32 dev bridgetest Guest configuration would look like this (simpliefied and shortend): eth0:     inet 1.1.1.1/32 Kernel IP routing table Destination     Gateway         Genmask         Flags Metric Ref Use Iface 0.0.0.0         192.0.2.1          0.0.0.0         UG       1.00 0        0 eth0 Now the biggest thing this enables us to do is in pve clusters if we build for example a ibgp full mesh the routes get shared. There could be any topology now and routing would adapt. just as an example while that is a shity topology it can illustrate the point.:       GW-1        GW-2         | \        / |         |  \      /  |         |   \    /   |        pve1--pve3            \      /             \    /              pve2 Any pve can fail and there would still be everything reachable. Always the shortest path will be chosen. Any link can Fail. Any Gateway can Fail. Even multiple links failing is ok. No chance for loops because every link is p2p. Much like at the full mesh ceph setup with ospf or openfabric. That can be archived with evpn/vxlan and anycast gateway and multiple exit nodes. Problem is the complexity and by giving bigger routes then /24 to gateways they will not always use the optimal path thus increasing latancy and putting unnesisary routing load on hosts where the vm isn't living right now. And all that to have one L2 domain which often brings more disadvantages then advantages. I hope I explained it well now, if not feel free to ask anything, I could also provide some bigger documentation with screenshots of everything. > > >> When the tap interface of a vm gets plugged a route needs to be created. >> Routes per VM get created with the comand ip route add 192.168.1.5/32 >> dev routedbridge. >> The /32 gateway address needs to be configured on the bridge as well. > This could be done in in the respective tap_plug / veth_create functions > inside pve-network [1]. You can override them on a per-zone basis so > that would fit right in. We'd have to implement analogous functions for > teardown though so we can remove the routes when updating / deleting the > tap / veth. > > Someone has actually implemented a quite similar thing via utilizing > hooks and a dedicated config files for each VM - see [2]. They're using > IPv6-LL addresses though (which I would personally also prefer), but I'm > unsure how it would work with windows guests for instance and it might > be weird / unintuitive for some users (see my previous mail). Yeah, sounds good. IPv6 Support needs to be implemented as well for all of this, I'm just starting with v4. > > >> There needs to be some way to configure the guests IPs as well, but in >> ipam there is currently no way to set a ip for a vm, it's only ip mac >> bindings. > That's imo the real question left, where to store the additional IPs. > Zone config is awkward, PVE IPAM might be workable with introducing > additional fields (and, for a PoC we could just deny using any other > IPAM plugin than that and implement it later). > > Network device is probably the best bet, since we can then utilize the > hotplug code in case an IP gets reassigned, which would be more > complicated with the other approaches. The only reason why I'm reluctant > is because we're introducing a property there that is specific to one > particular SDN zone and unused by everything else. I also feel like it would make sense in the network device, since it is part of specific configuration for that vm but I get why you are reluctant to that. This honestly makes me reconsider the sdn approach a little bit. I have an Idea here that could be something workable. What if we add a field not saying guest ip, what if we instead call id routes. Essentially that is what it is and might have extra use cases apart from what I'm trying to archive. That way for this use case you can use those fields to add the needed /32 host routes. It wouldn't be specific to the sdn feature we build. The SDN feature could then be more about configuring the bridge with the right addresses and fetures and enable us to later distribute the routes via bgp and other ways. I looked into the hotplug scenarios as well and that way those would be solved. > >> A potential security flaw is also devices on that bridge can steal a >> configured ip by just replying to arp. >> That could be mitigated by disabling bridge learning and creating static >> atp entires as well for those configured IPs. > That setting should be exposed in the zone configuration and probably be > on by default. There's also always the option of using IP / MAC filters > in the firewall although the static fdb / neighbor table approach is > preferable imo. Perfect, I'm on the same page. Implementing it in fdb / neighbor also ensures that crucial feature is there for users with firewall disabled. > > [0] https://docs.cilium.io/en/stable/network/lb-ipam/#requesting-ips > [1] > https://git.proxmox.com/?p=pve-network.git;a=blob;f=src/PVE/Network/SDN/Zones.pm;h=4da94580e07d6b3dcb794f19ce9335412fa7bc41;hb=HEAD#l298 > [2] https://siewert.io/posts/2022/announce-proxmox-vm-ips-via-bgp-1/ >