From mboxrd@z Thu Jan  1 00:00:00 1970
Return-Path: <t.lamprecht@proxmox.com>
Received: from firstgate.proxmox.com (firstgate.proxmox.com [212.224.123.68])
 (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits)
 key-exchange X25519 server-signature RSA-PSS (2048 bits))
 (No client certificate requested)
 by lists.proxmox.com (Postfix) with ESMTPS id 06FC999EF0
 for <pve-devel@lists.proxmox.com>; Thu,  4 May 2023 12:22:29 +0200 (CEST)
Received: from firstgate.proxmox.com (localhost [127.0.0.1])
 by firstgate.proxmox.com (Proxmox) with ESMTP id D46501CCB2
 for <pve-devel@lists.proxmox.com>; Thu,  4 May 2023 12:21:58 +0200 (CEST)
Received: from proxmox-new.maurer-it.com (proxmox-new.maurer-it.com
 [94.136.29.106])
 (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits)
 key-exchange X25519 server-signature RSA-PSS (2048 bits) server-digest SHA256)
 (No client certificate requested)
 by firstgate.proxmox.com (Proxmox) with ESMTPS
 for <pve-devel@lists.proxmox.com>; Thu,  4 May 2023 12:21:57 +0200 (CEST)
Received: from proxmox-new.maurer-it.com (localhost.localdomain [127.0.0.1])
 by proxmox-new.maurer-it.com (Proxmox) with ESMTP id 72BB14735D
 for <pve-devel@lists.proxmox.com>; Thu,  4 May 2023 12:21:57 +0200 (CEST)
Message-ID: <e48d26db-e438-ac92-6efd-10ba40e18080@proxmox.com>
Date: Thu, 4 May 2023 12:21:53 +0200
MIME-Version: 1.0
User-Agent: Mozilla Thunderbird
Content-Language: en-GB, de-AT
To: Dominik Csapak <d.csapak@proxmox.com>,
 Proxmox VE development discussion <pve-devel@lists.proxmox.com>
Cc: =?UTF-8?Q?Fabian_Gr=C3=BCnbichler?= <f.gruenbichler@proxmox.com>,
 Markus Frank <m.frank@proxmox.com>,
 Wolfgang Bumiller <w.bumiller@proxmox.com>
References: <20230425102136.85334-1-m.frank@proxmox.com>
 <20230425102136.85334-4-m.frank@proxmox.com>
 <1ed81c96-4228-adea-c3ec-e82a29a9e59e@proxmox.com>
 <ab8c78e7-8911-bded-32ea-d680cfe43226@proxmox.com>
 <6ad543e0-38ad-4540-a39d-320009aa3f42@proxmox.com>
 <43d62e1c-8555-d641-2788-9b15115d683b@proxmox.com>
 <10508419-a110-0e52-242d-a20c2e9f7243@proxmox.com>
From: Thomas Lamprecht <t.lamprecht@proxmox.com>
Autocrypt: addr=t.lamprecht@proxmox.com; keydata=
 xsFNBFsLjcYBEACsaQP6uTtw/xHTUCKF4VD4/Wfg7gGn47+OfCKJQAD+Oyb3HSBkjclopC5J
 uXsB1vVOfqVYE6PO8FlD2L5nxgT3SWkc6Ka634G/yGDU3ZC3C/7NcDVKhSBI5E0ww4Qj8s9w
 OQRloemb5LOBkJNEUshkWRTHHOmk6QqFB/qBPW2COpAx6oyxVUvBCgm/1S0dAZ9gfkvpqFSD
 90B5j3bL6i9FIv3YGUCgz6Ue3f7u+HsEAew6TMtlt90XV3vT4M2IOuECG/pXwTy7NtmHaBQ7
 UJBcwSOpDEweNob50+9B4KbnVn1ydx+K6UnEcGDvUWBkREccvuExvupYYYQ5dIhRFf3fkS4+
 wMlyAFh8PQUgauod+vqs45FJaSgTqIALSBsEHKEs6IoTXtnnpbhu3p6XBin4hunwoBFiyYt6
 YHLAM1yLfCyX510DFzX/Ze2hLqatqzY5Wa7NIXqYYelz7tXiuCLHP84+sV6JtEkeSUCuOiUY
 virj6nT/nJK8m0BzdR6FgGtNxp7RVXFRz/+mwijJVLpFsyG1i0Hmv2zTn3h2nyGK/I6yhFNt
 dX69y5hbo6LAsRjLUvZeHXpTU4TrpN/WiCjJblbj5um5eEr4yhcwhVmG102puTtuCECsDucZ
 jpKpUqzXlpLbzG/dp9dXFH3MivvfuaHrg3MtjXY1i+/Oxyp5iwARAQABzTNUaG9tYXMgTGFt
 cHJlY2h0IChBdXRoLTQpIDx0LmxhbXByZWNodEBwcm94bW94LmNvbT7CwY4EEwEIADgWIQQO
 R4qbEl/pah9K6VrTZCM6gDZWBgUCWwuNxgIbAwULCQgHAgYVCAkKCwIEFgIDAQIeAQIXgAAK
 CRDTZCM6gDZWBm/jD/4+6JB2s67eaqoP6x9VGaXNGJPCscwzLuxDTCG90G9FYu29VcXtubH/
 bPwsyBbNUQpqTm/s4XboU2qpS5ykCuTjqavrcP33tdkYfGcItj2xMipJ1i3TWvpikQVsX42R
 G64wovLs/dvpTYphRZkg5DwhgTmy3mRkmofFCTa+//MOcNOORltemp984tWjpR3bUJETNWpF
 sKGZHa3N4kCNxb7A+VMsJZ/1gN3jbQbQG7GkJtnHlWkw9rKCYqBtWrnrHa4UAvSa9M/XCIAB
 FThFGqZI1ojdVlv5gd6b/nWxfOPrLlSxbUo5FZ1i/ycj7/24nznW1V4ykG9iUld4uYUY86bB
 UGSjew1KYp9FmvKiwEoB+zxNnuEQfS7/Bj1X9nxizgweiHIyFsRqgogTvLh403QMSGNSoArk
 tqkorf1U+VhEncIn4H3KksJF0njZKfilrieOO7Vuot1xKr9QnYrZzJ7m7ZxJ/JfKGaRHXkE1
 feMmrvZD1AtdUATZkoeQtTOpMu4r6IQRfSdwm/CkppZXfDe50DJxAMDWwfK2rr2bVkNg/yZI
 tKLBS0YgRTIynkvv0h8d9dIjiicw3RMeYXyqOnSWVva2r+tl+JBaenr8YTQw0zARrhC0mttu
 cIZGnVEvQuDwib57QLqMjQaC1gazKHvhA15H5MNxUhwm229UmdH3KM7BTQRbC43GARAAyTkR
 D6KRJ9Xa2fVMh+6f186q0M3ni+5tsaVhUiykxjsPgkuWXWW9MbLpYXkzX6h/RIEKlo2BGA95
 QwG5+Ya2Bo3g7FGJHAkXY6loq7DgMp5/TVQ8phsSv3WxPTJLCBq6vNBamp5hda4cfXFUymsy
 HsJy4dtgkrPQ/bnsdFDCRUuhJHopnAzKHN8APXpKU6xV5e3GE4LwFsDhNHfH/m9+2yO/trcD
 txSFpyftbK2gaMERHgA8SKkzRhiwRTt9w5idOfpJVkYRsgvuSGZ0pcD4kLCOIFrer5xXudk6
 NgJc36XkFRMnwqrL/bB4k6Pi2u5leyqcXSLyBgeHsZJxg6Lcr2LZ35+8RQGPOw9C0ItmRjtY
 ZpGKPlSxjxA1WHT2YlF9CEt3nx7c4C3thHHtqBra6BGPyW8rvtq4zRqZRLPmZ0kt/kiMPhTM
 8wZAlObbATVrUMcZ/uNjRv2vU9O5aTAD9E5r1B0dlqKgxyoImUWB0JgpILADaT3VybDd3C8X
 s6Jt8MytUP+1cEWt9VKo4vY4Jh5vwrJUDLJvzpN+TsYCZPNVj18+jf9uGRaoK6W++DdMAr5l
 gQiwsNgf9372dbMI7pt2gnT5/YdG+ZHnIIlXC6OUonA1Ro/Itg90Q7iQySnKKkqqnWVc+qO9
 GJbzcGykxD6EQtCSlurt3/5IXTA7t6sAEQEAAcLBdgQYAQgAIBYhBA5HipsSX+lqH0rpWtNk
 IzqANlYGBQJbC43GAhsMAAoJENNkIzqANlYGD1sP/ikKgHgcspEKqDED9gQrTBvipH85si0j
 /Jwu/tBtnYjLgKLh2cjv1JkgYYjb3DyZa1pLsIv6rGnPX9bH9IN03nqirC/Q1Y1lnbNTynPk
 IflgvsJjoTNZjgu1wUdQlBgL/JhUp1sIYID11jZphgzfDgp/E6ve/8xE2HMAnf4zAfJaKgD0
 F+fL1DlcdYUditAiYEuN40Ns/abKs8I1MYx7Yglu3RzJfBzV4t86DAR+OvuF9v188WrFwXCS
 RSf4DmJ8tntyNej+DVGUnmKHupLQJO7uqCKB/1HLlMKc5G3GLoGqJliHjUHUAXNzinlpE2Vj
 C78pxpwxRNg2ilE3AhPoAXrY5qED5PLE9sLnmQ9AzRcMMJUXjTNEDxEYbF55SdGBHHOAcZtA
 kEQKub86e+GHA+Z8oXQSGeSGOkqHi7zfgW1UexddTvaRwE6AyZ6FxTApm8wq8NT2cryWPWTF
 BDSGB3ujWHMM8ERRYJPcBSjTvt0GcEqnd+OSGgxTkGOdufn51oz82zfpVo1t+J/FNz6MRMcg
 8nEC+uKvgzH1nujxJ5pRCBOquFZaGn/p71Yr0oVitkttLKblFsqwa+10Lt6HBxm+2+VLp4Ja
 0WZNncZciz3V3cuArpan/ZhhyiWYV5FD0pOXPCJIx7WS9PTtxiv0AOS4ScWEUmBxyhFeOpYa DrEx
In-Reply-To: <10508419-a110-0e52-242d-a20c2e9f7243@proxmox.com>
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: quoted-printable
X-SPAM-LEVEL: Spam detection results:  0
 AWL -0.238 Adjusted score from AWL reputation of From: address
 BAYES_00                 -1.9 Bayes spam probability is 0 to 1%
 DMARC_MISSING             0.1 Missing DMARC policy
 KAM_DMARC_STATUS 0.01 Test Rule for DKIM or SPF Failure with Strict Alignment
 POISEN_SPAM_PILL          0.1 Meta: its spam
 POISEN_SPAM_PILL_1        0.1 random spam to be learned in bayes
 POISEN_SPAM_PILL_3        0.1 random spam to be learned in bayes
 SPF_HELO_NONE           0.001 SPF: HELO does not publish an SPF Record
 SPF_PASS               -0.001 SPF: sender matches SPF record
 T_SCC_BODY_TEXT_LINE    -0.01 -
Subject: Re: [pve-devel] [PATCH manager v4 3/6] added Config for Shared
 Filesystem Directories
X-BeenThere: pve-devel@lists.proxmox.com
X-Mailman-Version: 2.1.29
Precedence: list
List-Id: Proxmox VE development discussion <pve-devel.lists.proxmox.com>
List-Unsubscribe: <https://lists.proxmox.com/cgi-bin/mailman/options/pve-devel>, 
 <mailto:pve-devel-request@lists.proxmox.com?subject=unsubscribe>
List-Archive: <http://lists.proxmox.com/pipermail/pve-devel/>
List-Post: <mailto:pve-devel@lists.proxmox.com>
List-Help: <mailto:pve-devel-request@lists.proxmox.com?subject=help>
List-Subscribe: <https://lists.proxmox.com/cgi-bin/mailman/listinfo/pve-devel>, 
 <mailto:pve-devel-request@lists.proxmox.com?subject=subscribe>
X-List-Received-Date: Thu, 04 May 2023 10:22:29 -0000

Am 04/05/2023 um 10:57 schrieb Dominik Csapak:
> On 5/4/23 10:42, Thomas Lamprecht wrote:
>> Am 04/05/2023 um 10:31 schrieb Dominik Csapak:
>>> On 5/4/23 10:13, Thomas Lamprecht wrote:
>>>> Am 03/05/2023 um 13:26 schrieb Dominik Csapak:
>>>>> just a short comment since this series overlaps a bit with my
>>>>> cluster resource mapping series (i plan on sending a v4 soon).
>>>>>
>>>>> i'd prefer to have the configuration endpoints for mapping bundled =
in a subdirectory
>>>>> so instead of /nodes/<node>/dirs/ i'd put it in /nodes/<node>/mappi=
ng/dirs/
>>>>> (or /nodes/<node>/map/dirs )
>>>>>
>>>>> @thomas, @fabian, any other input on that?
>>>>>
>>>>
>>>> huh? aren't mappings per definition cluster wide i.e. /cluster/resou=
rce-map/<mapping-id>
>>>> than then allows to add/update the mapping of a resource on a specif=
ic node?
>>>> A node specific path makes no sense to me, at max it would if adding=
/removing a mapping
>>>> is completely decoupled from adding/removing/updaing entries to it =E2=
=80=93 but that seems
>>>> convoluted from an usage POV and easy to get out of sync with the ac=
tual mapping list.
>>>
>>> in markus series the mapping are only ever per node, so each node has=
 it's
>>> own dir mapping
>>
>> Every resource maping is always per node, so that's not really changin=
g anything.
>>
>=20
> i meant there is no cluster aspect in the current version of markus ser=
ies at all

only if you lock down the vm hard to be kept at the node its mapping was =
configured.

>=20
>> Rather what about migrations? Would be simpler from migration and ACL =
POV to have
>> it cluster wide,
>=20
> sure, but how we get the info during migration is just an implementatio=
n detail and
> for that shouldn't matter where the configs/api endpoints are


no it really isn't an implementation detail how ACLs govern access to map=
pings, which
in turn dictactes where the ACL object paths should be, which then again =
ahs implications
for where the API endpoints and configs should be =E2=80=93 node level is=
n't the one.

>>>
>>> in my series, the actual config was cluster-wide, but the api endpoin=
t to configure
>>> them were sitting in the node path (e.g. /node/<nodes>/hardware-map/p=
ci/*)
>> =C2=A0> Please no.
>=20
> was that way since the very first version, would have been nice to get =
that feedback
> earlier
>=20

IIRC I always talked about this being on cluster level, especially w.r.t.=
 to making it
more general resource mapping, maybe I didn't looked to closely and was t=
hrown off
by the series name, i.e., "add *cluster-wide* hardware device mapping" an=
d on the ACL
path discussion (which IIRC never had any node-specific param in there, a=
nd thus only
work if it's either a clusterwide config or if ID uniquness is guaranteed=
 by the pmxcfs
like we do for VMIDs)

Anyhow, sorry, I'll try to check for such things more closely earlier in =
the future.

>>>
>>> we *could* put them into the cluster path api, but we'd need to send =
a node parameter
>>> along and forward it there anyway, so that wouldn't really make a dif=
ference
>>
>> no need for that, see above.
>=20
> there is a need for the node parameter, because you always need to know=
 for which node the
> mapping is anyway ;) or did you mean the 'forward' part?

yes, I certainly meant the forward part =E2=80=93 otherwise my "Every res=
ource mapping is always
per node" sentence and mentioned the node selector above would be really =
odd. A
node:local-id map always needs to exist, I figured that's the core thing =
this series wants
to solve, but as it's not a hard requirement to forward every request (an=
d a huge PITA
to get in sync if done that way, or at least would reaquire using a cfs_d=
omain_lock),
so no (hard) need for proxying.


>>>
>>> for reading the mappings, that could be done there, but in my series =
in the gui at least,
>>> i have to make a call to each node to get the current state of the ma=
pping
>>> (e.g. if the pci device is still there)
>>
>> For now not ideal but ok, in the future I'd rather go in the direction=
 of broadcasting
>> some types of HW resources via pmxcfs KV and then this isn't an issue =
anymore.
>=20
> yeah can be done, but the question is if we want to broadcast the whole=
 pci/usb list
> from all nodes? (i don't believe that scales well?)

how is one or two dozen of PCI IDs + some meta info, broadcasted normally=
 once per boot
(or statd start) per node a scaling problem? Especially compared to the a=
mount of data we
broadcast otherwise it's rather nelligible.

>=20
>>
>>>
>>> if a mapping exists (globally) is not interesting most of the time, w=
e only need to know
>>> if it exists at a specific node
>>
>> that's looking at it backwards, the user and ACL only care for global =
mapings, how
>> the code implements that is then, well an implementation detail.
>=20
> ACL yes, the user must configure the mapping on a vm (which sits on a s=
pecific node)
> and there the mapping must exist on that node
>=20

Where they also don't care for local-nodeness, they just select a mapping=
 (no need to hard
error on the mapping not being usable on the current node, VM can be migr=
ated before the
start =E2=80=93 albeit it would certainly improve UX if it was visually h=
inted); otherwise they
could've just used a local ID directly.

>>
>>>
>>> also, after seeing markus' patches, i also leaned more in the directi=
on of splitting
>>> my global mapping config into a config per type+node (so node1/usb.co=
nf, node1/pci.conf,
>>
>> no, please no forest^W jungle of config trees :/
>>
>> A /etc/pve/resource-map/<type>.conf must be enough, even a /etc/pve/re=
source-map.conf
>> should be tbh., but I could imagine that splitting per resource type m=
akes some (schema)
>> things a bit easier and reduce some bug potential, so not to hard feel=
ings on having one
>> cluster-wide per type; but really not more.
>>
>=20
> sure a single config is ok for me (only per type would be weird, since =
resuing the

Not sure if I'd find that flat out weird, depends on the whole design and=
 planned future=20
direction and if it's OK that the ID used for an PCI mapping can never be=
 reused for some
directory or other relatively unrelated mapping. FWIW, <type> here could =
also be the higher
level resource type (group), i.e., hardware.cfg for PCI/USB, one for loca=
l file system stuff
like directories.

FWIW, section config have a disadvantage if very different config types a=
re mixed, as the
schema is then also shown as sum of all possible types in API/man page (e=
ven storage.cfg,
while handling similar things, is already very confusing); possibly solva=
ble on api/manpage
doc generation though; just mentioning it as I had that in my mind alread=
y a few times
and as we should the improve sometime, and expanding use for wildly diffe=
rent types it
might be better to do it sooner than later (but not directly related to t=
his).

> sectionconfig would only ever have single type and we'd have to encode =
the
> nodename somehow)

Ideally we wouldn't have to encode the nodename, as doing so would make e=
specially using
a single config weird (i.e., it would make the <id>+<node> the sectionID =
from a config POV
but not from a usage POV, which would then be weird for a e.g., "pci: foo=
:nodeA" and a
"usb: foo:nodeB".

It should be tried to have a single config entry per ID, i.e. very handwa=
vy:

pci: id
    comment foo
    allow-multi-function 1
    other-map-wide-e.g-enforcement-setting-a 1
    ...
    map nodeA:3e:00.0,nodeB:0a:00.0

Or better, with the long contemplated array support (which not necesarril=
y you'd need to
spearhead)


pci: id
    ...
    map node=3DA,id=3D3e:00.0
    map node=3DB,id=3D3e:01.0

I read (but not in extended detail) "[PATCH common v3 2/3] add PVE/Hardwa=
reMap" and above
is probably not expressive enough on it's own, but otoh, the formats used=
 in the patch
seem a bit to flexible (i.e., makes not much sense to allow mdev on one m=
ap entry of one
node but not on the one from another node =E2=80=93 in practice that can =
hardly work for VMs I
guess and would limit any future live-migration caps?). Also why's the ID=
 that split up
there?

I'd rather try to keep the map to the minimal ID info and some general re=
strictions (e.g.,
if some special use is allowed), but other details that only apply on usi=
ng the mapped
device in a guest would still be handled there (as long as the mapping al=
lows it).

Sorry for the a bit higher level and slightly hand-wavy description, if y=
ou list specific
problems and required things I can try to formalize the config format I'd=
 envision to then
be also useful for other resource mappings a bit more.

And just to bring it at the "table", in theory we _maybe_ could split nam=
ing a node device
and the cluster-wide mapping, i.e., the mapping would be, well, cluster-w=
ide and naming
devices plus some extra "how can/should it be used" would then be node-sp=
ecific (possible
not even in /etc/pve) =E2=80=93 not saying I'd prefer that, but that's so=
mething that could be
explored before finalizing on a design; I don't want to rush this as we h=
ave to support
something big and certainly poplular as this basically forever, and while=
 config migrations
can be done somehow it's basically always a pita.