From mboxrd@z Thu Jan  1 00:00:00 1970
Return-Path: <pve-devel-bounces@lists.proxmox.com>
Received: from firstgate.proxmox.com (firstgate.proxmox.com [IPv6:2a01:7e0:0:424::9])
	by lore.proxmox.com (Postfix) with ESMTPS id 0B3E41FF13E
	for <inbox@lore.proxmox.com>; Fri, 06 Feb 2026 12:22:39 +0100 (CET)
Received: from firstgate.proxmox.com (localhost [127.0.0.1])
	by firstgate.proxmox.com (Proxmox) with ESMTP id 7438625938;
	Fri,  6 Feb 2026 12:23:10 +0100 (CET)
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=aetherus.de;
	s=default; t=1770376950;
	bh=s3qcVjjZ9b8PRBDARmdnNowrpFSuu8s1z9OfCH2Z4tQ=; h=Subject:To:From;
	b=aP6qO+Hs7dOLAK3Hf2uCYjxgmsC7GfTYUrBXTUXpxOKZ/5mPKyZddgULFVmwSoc8C
	 FuZFAnGxOtzGWADFhJt1KiHNbCZHHvNVaNh2Q1b9jLLBVQQrS/pJYP/ae4ZuDmYcSz
	 F4szOs2f5b2nHSr8O9GopErnzZiKRmzw/2smN0Klr/SnnEQc0hgDCJudPzLl8ONB88
	 oT2KFyGkpsP4gJaHsrDO6RxCFICILkKVjNfRLYUGjWOgJnaDw8aFHhVO1NFOBulMmC
	 BNMxrvI+I5Mzu65pVITHwilhHGDdk5ohVqUw53sRBvBugXYyoGSFFx9Am2z+Yd6RSd
	 qdT+krErrBngg==
Authentication-Results: plesk01;
        spf=pass (sender IP is 195.5.114.21) smtp.mailfrom=klein@aetherus.de
 smtp.helo=[10.97.254.1]
Received-SPF: pass (plesk01: connection is authenticated)
Message-ID: <a2110fae-4877-49c7-91d8-62364b78f3f9@aetherus.de>
Date: Fri, 6 Feb 2026 12:22:30 +0100
MIME-Version: 1.0
User-Agent: Mozilla Thunderbird
Subject: Re: [pve-devel] [PATCH container 1/1] Signed-off-by: Maurice Klein
 <klein@aetherus.de>
To: Stefan Hanreich <s.hanreich@proxmox.com>, pve-devel@lists.proxmox.com
References: <20260109121049.70740-1-klein@aetherus.de>
 <20260109121049.70740-2-klein@aetherus.de>
 <mailman.420.1768811863.353.pve-devel@lists.proxmox.com>
 <021b748f-44db-4546-8399-c6f7312a11fc@proxmox.com>
 <d18928a0-6ab0-4e90-ad3a-0674bbdedb72@aetherus.de>
 <77ad7294-e8b6-4862-8ce5-b81181d1188f@proxmox.com>
 <321bd4ff-f147-4329-9788-50061d569fa6@aetherus.de>
 <5d2bcf8a-ea6f-48aa-8cc6-c92cfb93311f@proxmox.com>
 <2a06be0f-4f4d-4c90-ab9b-4f7b062d6664@aetherus.de>
 <fd4f6545-fc5c-4fed-b8cf-d0e062b6c5c6@proxmox.com>
Content-Language: en-US
From: Maurice Klein <klein@aetherus.de>
In-Reply-To: <fd4f6545-fc5c-4fed-b8cf-d0e062b6c5c6@proxmox.com>
Content-Type: text/plain; charset=UTF-8; format=flowed
Content-Transfer-Encoding: 8bit
X-PPP-Message-ID: 
 <177037695077.3913436.12936926019397878816@plesk01.aetherus.io>
X-PPP-Vhost: aetherus.de
X-SPAM-LEVEL: Spam detection results:  0
	AWL                    -0.001 Adjusted score from AWL reputation of From:
 address
	BAYES_00                 -1.9 Bayes spam probability is 0 to 1%
	DKIM_SIGNED               0.1 Message has a DKIM or DK signature,
 not necessarily valid
	DKIM_VALID               -0.1 Message has at least one valid DKIM or DK
 signature
	DKIM_VALID_AU            -0.1 Message has a valid DKIM or DK signature from
 author's domain
	DKIM_VALID_EF            -0.1 Message has a valid DKIM or DK signature from
 envelope-from domain
	DMARC_PASS               -0.1 DMARC pass policy
	RCVD_IN_VALIDITY_CERTIFIED_BLOCKED  0.001 ADMINISTRATOR NOTICE: The query to
 Validity was blocked.  See
 https://knowledge.validity.com/hc/en-us/articles/20961730681243 for more
 information.
	RCVD_IN_VALIDITY_RPBL_BLOCKED  0.001 ADMINISTRATOR NOTICE: The query to
 Validity was blocked.  See
 https://knowledge.validity.com/hc/en-us/articles/20961730681243 for more
 information.
	RCVD_IN_VALIDITY_SAFE_BLOCKED  0.001 ADMINISTRATOR NOTICE: The query to
 Validity was blocked.  See
 https://knowledge.validity.com/hc/en-us/articles/20961730681243 for more
 information.
	SPF_HELO_NONE           0.001 SPF: HELO does not publish an SPF Record
	SPF_PASS               -0.001 SPF: sender matches SPF record
Message-ID-Hash: KTO3XCBUJZ54OTIWOXIFRALMBU735OKU
X-Message-ID-Hash: KTO3XCBUJZ54OTIWOXIFRALMBU735OKU
X-MailFrom: klein@aetherus.de
X-Mailman-Rule-Misses: dmarc-mitigation; no-senders; approved; loop;
 banned-address; emergency; member-moderation; nonmember-moderation;
 administrivia; implicit-dest; max-recipients; max-size; news-moderation;
 no-subject; digests; suspicious-header
X-Mailman-Version: 3.3.10
Precedence: list
List-Id: Proxmox VE development discussion <pve-devel.lists.proxmox.com>
List-Help: <mailto:pve-devel-request@lists.proxmox.com?subject=help>
List-Owner: <mailto:pve-devel-owner@lists.proxmox.com>
List-Post: <mailto:pve-devel@lists.proxmox.com>
List-Subscribe: <mailto:pve-devel-join@lists.proxmox.com>
List-Unsubscribe: <mailto:pve-devel-leave@lists.proxmox.com>

Am 06.02.26 um 09:23 schrieb Stefan Hanreich:
> On 2/1/26 3:31 PM, Maurice Klein wrote:
>> Basicly the vnet and subnet part I see as in issue.
>> Since in this kind of setup there is no defined subsets required the
>> current configuration doesn't fully make sense.
>> I guess you could still have a subnet configuration and configure all
>> the host addresses inside that subnet, but it's not really necessary .
>> Every VM route would be a /32 route and also the configured address on
>> that bridge (gateway field) would be a /32.
> We would still need a local IP on the PVE host that acts as a gateway
> and preferably an IP for the VM inside the subnet so you can route the
> traffic for the /32 IPs there. So we'd need to configure e.g.
> 192.0.2.0/24 as subnet, then have the host as gateway (e.g. 192.0.2.1)
> and each VM gets an IP inside that subnet (which could automatically be
> handled via IPAM / DHCP). Looking at other implementations (e.g.
> kube-router) there's even a whole subnet pool and each node gets one
> subnet from that pool - but that's easier done with containers than VMs,
> so I think the approach with one shared subnet seems easier
> (particularly for VM mobility).

I think I didn't explain properly about that.
Basically the whole Idea is to have a gateway IP like 192.0.2.1/32 on 
the pve host on that bridge and not have a /24 or so route then.
Guests then also have addresses whatever they might look like.
For example a guest could have 1.1.1.1/32 but usually always /32, 
although I guess for some use cases it could be beneficial to be able to 
have a guest that gets more then a /32 but let's put that aside for now.
Now there is no need/reason to define which subnet a guest is on and no 
need to be in the same with the host.

The guest would configure it's ip statically inside and it would be a 
/32 usually.

Now on the pve a host route to 1.1.1.1/32 would be added by the 
following comand:
ip route add 1.1.1.1/32 dev bridgetest

Guest configuration would look like this (simpliefied and shortend):
eth0: <BROADCAST,MULTICAST,UP,LOWER_UP>
     inet 1.1.1.1/32

Kernel IP routing table
Destination     Gateway         Genmask         Flags Metric Ref Use Iface
0.0.0.0         192.0.2.1          0.0.0.0         UG       1.00 0      
   0 eth0

Now the biggest thing this enables us to do is in pve clusters if we 
build for example a ibgp full mesh the routes get shared.
There could be any topology now and routing would adapt.
just as an example while that is a shity topology it can illustrate the 
point.:

       GW-1        GW-2
         | \        / |
         |  \      /  |
         |   \    /   |
        pve1--pve3
            \      /
             \    /
              pve2

Any pve can fail and there would still be everything reachable.
Always the shortest path will be chosen.
Any link can Fail.
Any Gateway can Fail.
Even multiple links failing is ok.
No chance for loops because every link is p2p.
Much like at the full mesh ceph setup with ospf or openfabric.

That can be archived with evpn/vxlan and anycast gateway and multiple 
exit nodes.
Problem is the complexity and by giving bigger routes then /24 to 
gateways they will not always use the optimal path thus increasing 
latancy and putting unnesisary routing load on hosts where the vm isn't 
living right now.
And all that to have one L2 domain which often brings more disadvantages 
then advantages.

I hope I explained it well now, if not feel free to ask anything, I 
could also provide some bigger documentation with screenshots of everything.


>
>
>> When the tap interface of a vm gets plugged a route needs to be created.
>> Routes per VM get created with the comand ip route add 192.168.1.5/32
>> dev routedbridge.
>> The /32 gateway address needs to be configured on the bridge as well.
> This could be done in in the respective tap_plug / veth_create functions
> inside pve-network [1]. You can override them on a per-zone basis so
> that would fit right in. We'd have to implement analogous functions for
> teardown though so we can remove the routes when updating / deleting the
> tap / veth.
>
> Someone has actually implemented a quite similar thing via utilizing
> hooks and a dedicated config files for each VM - see [2]. They're using
> IPv6-LL addresses though (which I would personally also prefer), but I'm
> unsure how it would work with windows guests for instance and it might
> be weird / unintuitive for some users (see my previous mail).
Yeah, sounds good.
IPv6 Support needs to be implemented as well for all of this, I'm just 
starting with v4.

>
>
>> There needs to be some way to configure the guests IPs as well, but in
>> ipam there is currently no way to set a ip for a vm, it's only ip mac
>> bindings.
> That's imo the real question left, where to store the additional IPs.
> Zone config is awkward, PVE IPAM might be workable with introducing
> additional fields (and, for a PoC we could just deny using any other
> IPAM plugin than that and implement it later).
>
> Network device is probably the best bet, since we can then utilize the
> hotplug code in case an IP gets reassigned, which would be more
> complicated with the other approaches. The only reason why I'm reluctant
> is because we're introducing a property there that is specific to one
> particular SDN zone and unused by everything else.

I also feel like it would make sense in the network device, since it is 
part of specific configuration for that vm but I get why you are 
reluctant to that.
This honestly makes me reconsider the sdn approach a little bit.
I have an Idea here that could be something workable.
What if we add a field not saying guest ip, what if we instead call id 
routes.
Essentially that is what it is and might have extra use cases apart from 
what I'm trying to archive.
That way for this use case you can use those fields to add the needed 
/32 host routes.
It wouldn't be specific to the sdn feature we build.
The SDN feature could then be more about configuring the bridge with the 
right addresses and fetures and enable us to later distribute the routes 
via bgp and other ways.
I looked into the hotplug scenarios as well and that way those would be 
solved.

>
>> A potential security flaw is also devices on that bridge can steal a
>> configured ip by just replying to arp.
>> That could be mitigated by disabling bridge learning and creating static
>> atp entires as well for those configured IPs.
> That setting should be exposed in the zone configuration and probably be
> on by default. There's also always the option of using IP / MAC filters
> in the firewall although the static fdb / neighbor table approach is
> preferable imo.

Perfect, I'm on the same page.
Implementing it in fdb / neighbor also ensures that crucial feature is 
there for users with firewall disabled.
>
> [0] https://docs.cilium.io/en/stable/network/lb-ipam/#requesting-ips
> [1]
> https://git.proxmox.com/?p=pve-network.git;a=blob;f=src/PVE/Network/SDN/Zones.pm;h=4da94580e07d6b3dcb794f19ce9335412fa7bc41;hb=HEAD#l298
> [2] https://siewert.io/posts/2022/announce-proxmox-vm-ips-via-bgp-1/
>