From mboxrd@z Thu Jan  1 00:00:00 1970
Return-Path: <pve-devel-bounces@lists.proxmox.com>
Received: from firstgate.proxmox.com (firstgate.proxmox.com [212.224.123.68])
	by lore.proxmox.com (Postfix) with ESMTPS id C39C21FF139
	for <inbox@lore.proxmox.com>; Tue, 10 Feb 2026 10:55:39 +0100 (CET)
Received: from firstgate.proxmox.com (localhost [127.0.0.1])
	by firstgate.proxmox.com (Proxmox) with ESMTP id DE0401AB48;
	Tue, 10 Feb 2026 10:56:22 +0100 (CET)
Message-ID: <fd5722e0-2c29-4c58-9f0f-c50b523e6989@proxmox.com>
Date: Tue, 10 Feb 2026 10:56:19 +0100
MIME-Version: 1.0
User-Agent: Mozilla Thunderbird
From: Stefan Hanreich <s.hanreich@proxmox.com>
Subject: Re: [pve-devel] [PATCH container 1/1] Signed-off-by: Maurice Klein
 <klein@aetherus.de>
To: Maurice Klein <klein@aetherus.de>, pve-devel@lists.proxmox.com
References: <20260109121049.70740-1-klein@aetherus.de>
 <20260109121049.70740-2-klein@aetherus.de>
 <mailman.420.1768811863.353.pve-devel@lists.proxmox.com>
 <021b748f-44db-4546-8399-c6f7312a11fc@proxmox.com>
 <d18928a0-6ab0-4e90-ad3a-0674bbdedb72@aetherus.de>
 <77ad7294-e8b6-4862-8ce5-b81181d1188f@proxmox.com>
 <321bd4ff-f147-4329-9788-50061d569fa6@aetherus.de>
 <5d2bcf8a-ea6f-48aa-8cc6-c92cfb93311f@proxmox.com>
 <2a06be0f-4f4d-4c90-ab9b-4f7b062d6664@aetherus.de>
 <fd4f6545-fc5c-4fed-b8cf-d0e062b6c5c6@proxmox.com>
 <a2110fae-4877-49c7-91d8-62364b78f3f9@aetherus.de>
Content-Language: en-US
In-Reply-To: <a2110fae-4877-49c7-91d8-62364b78f3f9@aetherus.de>
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: 8bit
X-SPAM-LEVEL: Spam detection results:  0
	AWL                     0.721 Adjusted score from AWL reputation of From:
 address
	BAYES_00                 -1.9 Bayes spam probability is 0 to 1%
	DMARC_MISSING             0.1 Missing DMARC policy
	KAM_DMARC_STATUS         0.01 Test Rule for DKIM or SPF Failure with Strict
 Alignment
	RCVD_IN_VALIDITY_CERTIFIED_BLOCKED  0.001 ADMINISTRATOR NOTICE: The query to
 Validity was blocked.  See
 https://knowledge.validity.com/hc/en-us/articles/20961730681243 for more
 information.
	RCVD_IN_VALIDITY_RPBL_BLOCKED  0.001 ADMINISTRATOR NOTICE: The query to
 Validity was blocked.  See
 https://knowledge.validity.com/hc/en-us/articles/20961730681243 for more
 information.
	RCVD_IN_VALIDITY_SAFE_BLOCKED  0.001 ADMINISTRATOR NOTICE: The query to
 Validity was blocked.  See
 https://knowledge.validity.com/hc/en-us/articles/20961730681243 for more
 information.
	SPF_HELO_NONE           0.001 SPF: HELO does not publish an SPF Record
	SPF_PASS               -0.001 SPF: sender matches SPF record
Message-ID-Hash: SUIDDGXHS6B7N2UEZ332PXKUXUJJM3TS
X-Message-ID-Hash: SUIDDGXHS6B7N2UEZ332PXKUXUJJM3TS
X-MailFrom: s.hanreich@proxmox.com
X-Mailman-Rule-Misses: dmarc-mitigation; no-senders; approved; loop;
 banned-address; emergency; member-moderation; nonmember-moderation;
 administrivia; implicit-dest; max-recipients; max-size; news-moderation;
 no-subject; digests; suspicious-header
X-Mailman-Version: 3.3.10
Precedence: list
List-Id: Proxmox VE development discussion <pve-devel.lists.proxmox.com>
List-Help: <mailto:pve-devel-request@lists.proxmox.com?subject=help>
List-Owner: <mailto:pve-devel-owner@lists.proxmox.com>
List-Post: <mailto:pve-devel@lists.proxmox.com>
List-Subscribe: <mailto:pve-devel-join@lists.proxmox.com>
List-Unsubscribe: <mailto:pve-devel-leave@lists.proxmox.com>

On 2/6/26 12:21 PM, Maurice Klein wrote:

[snip]

> I think I didn't explain properly about that.
> Basically the whole Idea is to have a gateway IP like 192.0.2.1/32 on
> the pve host on that bridge and not have a /24 or so route then.

Those are just local to the node for routing, the /24 wouldn't get
announced - only the /32 routes for the additional IPs. But I guess with
that setup you could do without it as well. It shouldn't be an issue to
create a 'subnet' as /32 and then give the PVE host the only IP as
gateway IP and configure it that way. Layer-2 zones for instance (VLAN,
QinQ, VXLAN) don't even need a subnet configured at all - so I don't see
a blocker there.

> Guests then also have addresses whatever they might look like.
> For example a guest could have 1.1.1.1/32 but usually always /32,
> although I guess for some use cases it could be beneficial to be able to
> have a guest that gets more then a /32 but let's put that aside for now.

would be quite interesting for IPv6 actually.

> Now there is no need/reason to define which subnet a guest is on and no
> need to be in the same with the host.
> 
> The guest would configure it's ip statically inside and it would be
> a /32 usually.

Yeah, most implementations I've seen usually have a 'cluster IP' that is
in RFC 1918 range for local cluster communication, but with containers
that works a lot easier since you can control the network configuration
of them whereas with VMs you cannot and would need to update the
configuration on every move - or use the same subnet across every node
instead of having a dedicated subnet per node/rack.

[snip]

> Now the biggest thing this enables us to do is in pve clusters if we
> build for example a ibgp full mesh the routes get shared.
> There could be any topology now and routing would adapt.
> just as an example while that is a shity topology it can illustrate the
> point.:
> 
>       GW-1        GW-2
>         | \        / |
>         |  \      /  |
>         |   \    /   |
>        pve1--pve3
>            \      /
>             \    /
>              pve2
> 
> Any pve can fail and there would still be everything reachable.
> Always the shortest path will be chosen.
> Any link can Fail.
> Any Gateway can Fail.
> Even multiple links failing is ok.
> No chance for loops because every link is p2p.
> Much like at the full mesh ceph setup with ospf or openfabric.
> 
> That can be archived with evpn/vxlan and anycast gateway and multiple
> exit nodes.
> Problem is the complexity and by giving bigger routes then /24 to
> gateways they will not always use the optimal path thus increasing
> latancy and putting unnesisary routing load on hosts where the vm isn't
> living right now.
> And all that to have one L2 domain which often brings more disadvantages
> then advantages.
> 
> I hope I explained it well now, if not feel free to ask anything, I
> could also provide some bigger documentation with screenshots of
> everything.

Yes that makes sense. The way I described it in my previous mail should
be like that, since it decouples the IP configuration + route creation
(which would then be handled by the zone / vnet) from the announcement
of that route (which would be handled by fabrics). As a start we could
just utilize the default routing table. I'm planning on adding VRF +
Route Redistribution + Route Map support mid-term, so the new zone could
then profit from those without having to implement anything of the sort
for now. It's a bit of an awkward timing, since I'm still working on
implementing several features that this plugin would benefit quite
heavily from and I don't want to do any duplicate work / code ourselves
into a corner by basically implementing all that functionality but only
specific to that plugin and then having to migrate everything over while
maintaining backwards compatibility.

[snip]

> I also feel like it would make sense in the network device, since it is
> part of specific configuration for that vm but I get why you are
> reluctant to that.
> This honestly makes me reconsider the sdn approach a little bit.
> I have an Idea here that could be something workable.
> What if we add a field not saying guest ip, what if we instead call id
> routes.
> Essentially that is what it is and might have extra use cases apart from
> what I'm trying to archive.
> That way for this use case you can use those fields to add the
> needed /32 host routes.
> It wouldn't be specific to the sdn feature we build.
> The SDN feature could then be more about configuring the bridge with the
> right addresses and fetures and enable us to later distribute the routes
> via bgp and other ways.
> I looked into the hotplug scenarios as well and that way those would be
> solved.

Yeah, I think VM configuration is the best bet. It should be tied to the
network device imo, so I guess adding a property that allows configuring
a CIDR there should be fine for starting out. Adding the route is
handled by the respective tap_plug / veth_create functions in
pve-network and the new zone plugin then.

[snip]