Re: [pve-devel] [PATCH container 1/1] Signed-off-by: Maurice Klein <klein@aetherus.de>

all lists on lists.proxmox.com
 help / color / mirror / Atom feed

From: Maurice Klein <klein@aetherus.de>
To: Stefan Hanreich <s.hanreich@proxmox.com>, pve-devel@lists.proxmox.com
Subject: Re: [pve-devel] [PATCH container 1/1] Signed-off-by: Maurice Klein <klein@aetherus.de>
Date: Fri, 6 Feb 2026 12:22:30 +0100	[thread overview]
Message-ID: <a2110fae-4877-49c7-91d8-62364b78f3f9@aetherus.de> (raw)
In-Reply-To: <fd4f6545-fc5c-4fed-b8cf-d0e062b6c5c6@proxmox.com>

Am 06.02.26 um 09:23 schrieb Stefan Hanreich:
> On 2/1/26 3:31 PM, Maurice Klein wrote:
>> Basicly the vnet and subnet part I see as in issue.
>> Since in this kind of setup there is no defined subsets required the
>> current configuration doesn't fully make sense.
>> I guess you could still have a subnet configuration and configure all
>> the host addresses inside that subnet, but it's not really necessary .
>> Every VM route would be a /32 route and also the configured address on
>> that bridge (gateway field) would be a /32.
> We would still need a local IP on the PVE host that acts as a gateway
> and preferably an IP for the VM inside the subnet so you can route the
> traffic for the /32 IPs there. So we'd need to configure e.g.
> 192.0.2.0/24 as subnet, then have the host as gateway (e.g. 192.0.2.1)
> and each VM gets an IP inside that subnet (which could automatically be
> handled via IPAM / DHCP). Looking at other implementations (e.g.
> kube-router) there's even a whole subnet pool and each node gets one
> subnet from that pool - but that's easier done with containers than VMs,
> so I think the approach with one shared subnet seems easier
> (particularly for VM mobility).

I think I didn't explain properly about that.
Basically the whole Idea is to have a gateway IP like 192.0.2.1/32 on 
the pve host on that bridge and not have a /24 or so route then.
Guests then also have addresses whatever they might look like.
For example a guest could have 1.1.1.1/32 but usually always /32, 
although I guess for some use cases it could be beneficial to be able to 
have a guest that gets more then a /32 but let's put that aside for now.
Now there is no need/reason to define which subnet a guest is on and no 
need to be in the same with the host.

The guest would configure it's ip statically inside and it would be a 
/32 usually.

Now on the pve a host route to 1.1.1.1/32 would be added by the 
following comand:
ip route add 1.1.1.1/32 dev bridgetest

Guest configuration would look like this (simpliefied and shortend):
eth0: <BROADCAST,MULTICAST,UP,LOWER_UP>
     inet 1.1.1.1/32

Kernel IP routing table
Destination     Gateway         Genmask         Flags Metric Ref Use Iface
0.0.0.0         192.0.2.1          0.0.0.0         UG       1.00 0      
   0 eth0

Now the biggest thing this enables us to do is in pve clusters if we 
build for example a ibgp full mesh the routes get shared.
There could be any topology now and routing would adapt.
just as an example while that is a shity topology it can illustrate the 
point.:

       GW-1        GW-2
         | \        / |
         |  \      /  |
         |   \    /   |
        pve1--pve3
            \      /
             \    /
              pve2

Any pve can fail and there would still be everything reachable.
Always the shortest path will be chosen.
Any link can Fail.
Any Gateway can Fail.
Even multiple links failing is ok.
No chance for loops because every link is p2p.
Much like at the full mesh ceph setup with ospf or openfabric.

That can be archived with evpn/vxlan and anycast gateway and multiple 
exit nodes.
Problem is the complexity and by giving bigger routes then /24 to 
gateways they will not always use the optimal path thus increasing 
latancy and putting unnesisary routing load on hosts where the vm isn't 
living right now.
And all that to have one L2 domain which often brings more disadvantages 
then advantages.

I hope I explained it well now, if not feel free to ask anything, I 
could also provide some bigger documentation with screenshots of everything.

>
>
>> When the tap interface of a vm gets plugged a route needs to be created.
>> Routes per VM get created with the comand ip route add 192.168.1.5/32
>> dev routedbridge.
>> The /32 gateway address needs to be configured on the bridge as well.
> This could be done in in the respective tap_plug / veth_create functions
> inside pve-network [1]. You can override them on a per-zone basis so
> that would fit right in. We'd have to implement analogous functions for
> teardown though so we can remove the routes when updating / deleting the
> tap / veth.
>
> Someone has actually implemented a quite similar thing via utilizing
> hooks and a dedicated config files for each VM - see [2]. They're using
> IPv6-LL addresses though (which I would personally also prefer), but I'm
> unsure how it would work with windows guests for instance and it might
> be weird / unintuitive for some users (see my previous mail).
Yeah, sounds good.
IPv6 Support needs to be implemented as well for all of this, I'm just 
starting with v4.

>
>
>> There needs to be some way to configure the guests IPs as well, but in
>> ipam there is currently no way to set a ip for a vm, it's only ip mac
>> bindings.
> That's imo the real question left, where to store the additional IPs.
> Zone config is awkward, PVE IPAM might be workable with introducing
> additional fields (and, for a PoC we could just deny using any other
> IPAM plugin than that and implement it later).
>
> Network device is probably the best bet, since we can then utilize the
> hotplug code in case an IP gets reassigned, which would be more
> complicated with the other approaches. The only reason why I'm reluctant
> is because we're introducing a property there that is specific to one
> particular SDN zone and unused by everything else.

I also feel like it would make sense in the network device, since it is 
part of specific configuration for that vm but I get why you are 
reluctant to that.
This honestly makes me reconsider the sdn approach a little bit.
I have an Idea here that could be something workable.
What if we add a field not saying guest ip, what if we instead call id 
routes.
Essentially that is what it is and might have extra use cases apart from 
what I'm trying to archive.
That way for this use case you can use those fields to add the needed 
/32 host routes.
It wouldn't be specific to the sdn feature we build.
The SDN feature could then be more about configuring the bridge with the 
right addresses and fetures and enable us to later distribute the routes 
via bgp and other ways.
I looked into the hotplug scenarios as well and that way those would be 
solved.

>
>> A potential security flaw is also devices on that bridge can steal a
>> configured ip by just replying to arp.
>> That could be mitigated by disabling bridge learning and creating static
>> atp entires as well for those configured IPs.
> That setting should be exposed in the zone configuration and probably be
> on by default. There's also always the option of using IP / MAC filters
> in the firewall although the static fdb / neighbor table approach is
> preferable imo.

Perfect, I'm on the same page.
Implementing it in fdb / neighbor also ensures that crucial feature is 
there for users with firewall disabled.
>
> [0] https://docs.cilium.io/en/stable/network/lb-ipam/#requesting-ips
> [1]
> https://git.proxmox.com/?p=pve-network.git;a=blob;f=src/PVE/Network/SDN/Zones.pm;h=4da94580e07d6b3dcb794f19ce9335412fa7bc41;hb=HEAD#l298
> [2] https://siewert.io/posts/2022/announce-proxmox-vm-ips-via-bgp-1/
>

next prev parent reply	other threads:[~2026-02-06 11:22 UTC|newest]

Thread overview: 11+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
     [not found] <20260109121049.70740-1-klein@aetherus.de>
2026-01-09 12:10 ` Maurice Klein via pve-devel
     [not found] ` <20260109121049.70740-2-klein@aetherus.de>
2026-01-19  8:37   ` Maurice Klein via pve-devel
2026-01-19 14:35     ` Stefan Hanreich
2026-01-21 19:04       ` Maurice Klein via pve-devel
     [not found]       ` <d18928a0-6ab0-4e90-ad3a-0674bbdedb72@aetherus.de>
2026-01-27 10:02         ` Stefan Hanreich
2026-01-27 10:37           ` Maurice Klein via pve-devel
     [not found]           ` <321bd4ff-f147-4329-9788-50061d569fa6@aetherus.de>
2026-01-29 12:20             ` Stefan Hanreich
2026-02-01 14:32               ` Maurice Klein
2026-02-06  8:23                 ` Stefan Hanreich
2026-02-06 11:22                   ` Maurice Klein [this message]
     [not found] <20260109124514.72991-1-klein@aetherus.de>
2026-01-09 12:45 ` Maurice Klein via pve-devel

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=a2110fae-4877-49c7-91d8-62364b78f3f9@aetherus.de \
    --to=klein@aetherus.de \
    --cc=pve-devel@lists.proxmox.com \
    --cc=s.hanreich@proxmox.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.

Service provided by Proxmox Server Solutions GmbH | Privacy | Legal