From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from firstgate.proxmox.com (firstgate.proxmox.com [212.224.123.68]) by lore.proxmox.com (Postfix) with ESMTPS id 14CD21FF163 for ; Thu, 19 Dec 2024 10:57:10 +0100 (CET) Received: from firstgate.proxmox.com (localhost [127.0.0.1]) by firstgate.proxmox.com (Proxmox) with ESMTP id 934142FB70; Thu, 19 Dec 2024 10:57:09 +0100 (CET) DKIM-Signature: v=1; a=rsa-sha256; c=simple/simple; d=tuxis.nl; s=mail; h=from:reply-to:subject:date:message-id:to:cc:mime-version:content-type: in-reply-to:references; bh=ITi6VX6jN8mza89z4hxqLRkIqzvTe7y0U9ialr0LfJg=; b=VpsPR+QB1K2FsCBdD1Ki0kEHM/oEzi+17BVsmzFY6YRj7hNbHHbkwmS+wAQt0FyNqrCaM0aJTFCPN EOVfOCZ86vgwB+8aXiTCd8T8vNRTl4CPJZUtmpBfuGicwgvMtnSUlokF64vG0Kp4HEY7bJH0FF6GG6 R8rL+0Q0hBrASrsY6qRgu83EQejlvCilTkfWNcYQ/VV2Khn1lpsyPDQPnhoNc1s62nt4P/oWxJ07+P bpO6D7VrqxW5TjF+B+EzJ3XLtg1gkOjnmfgtIzp0KmjKmNL226yD7Is3TfG5jmMSIgvFoDhOQI6qNh CTa/rafSd2ZcT+T02iHw33BAxrhV/rA== X-Footer: dHV4aXMubmw= From: "Mark Schouten" To: "Shannon Sterz" Date: Thu, 19 Dec 2024 09:56:23 +0000 Message-Id: In-Reply-To: References: User-Agent: eM_Client/10.1.4828.0 MIME-Version: 1.0 X-SPAM-LEVEL: Spam detection results: 0 AWL -0.111 Adjusted score from AWL reputation of From: address BAYES_00 -1.9 Bayes spam probability is 0 to 1% DKIM_SIGNED 0.1 Message has a DKIM or DK signature, not necessarily valid DKIM_VALID -0.1 Message has at least one valid DKIM or DK signature DKIM_VALID_AU -0.1 Message has a valid DKIM or DK signature from author's domain DKIM_VALID_EF -0.1 Message has a valid DKIM or DK signature from envelope-from domain DMARC_PASS -0.1 DMARC pass policy HTML_MESSAGE 0.001 HTML included in message KAM_LOTSOFHASH 0.25 Emails with lots of hash-like gibberish RCVD_IN_VALIDITY_CERTIFIED_BLOCKED 0.001 ADMINISTRATOR NOTICE: The query to Validity was blocked. See https://knowledge.validity.com/hc/en-us/articles/20961730681243 for more information. RCVD_IN_VALIDITY_RPBL_BLOCKED 0.001 ADMINISTRATOR NOTICE: The query to Validity was blocked. See https://knowledge.validity.com/hc/en-us/articles/20961730681243 for more information. RCVD_IN_VALIDITY_SAFE_BLOCKED 0.001 ADMINISTRATOR NOTICE: The query to Validity was blocked. See https://knowledge.validity.com/hc/en-us/articles/20961730681243 for more information. SPF_HELO_NONE 0.001 SPF: HELO does not publish an SPF Record SPF_PASS -0.001 SPF: sender matches SPF record Subject: Re: [pbs-devel] Authentication performance X-BeenThere: pbs-devel@lists.proxmox.com X-Mailman-Version: 2.1.29 Precedence: list List-Id: Proxmox Backup Server development discussion List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Reply-To: Mark Schouten , Proxmox Backup Server development discussion Cc: Proxmox Backup Server development discussion Content-Type: multipart/mixed; boundary="===============2087944167108576542==" Errors-To: pbs-devel-bounces@lists.proxmox.com Sender: "pbs-devel" --===============2087944167108576542== Content-Type: multipart/alternative; boundary="------=_MB028D2454-D79A-484D-AE8C-58DA2711EBCE" --------=_MB028D2454-D79A-484D-AE8C-58DA2711EBCE Content-Type: text/plain; charset=utf-8; format=flowed Content-Transfer-Encoding: quoted-printable Hi, We upgraded to 3.3 yesterday, not much gain to notice with regards to=20 the new version or the change in keying. It=E2=80=99s still (obvioulsy) pre= tty=20 busy. However, I also tried to remove some datastores, which failed with=20 timeouts. PBS even stopped authenticating (so probably just working) all=20 together for about 10 seconds, which was an unpleasant surprise. So looking into that further, I noticed the following logging: Dec 18 16:14:32 pbs005 proxmox-backup-proxy[39143]: GET=20 /api2/json/admin/datastore/XXXXXX/status: 400 Bad Request: [client=20 [::ffff]:42104] Unable to acquire lock=20 "/etc/proxmox-backup/.datastore.lck" - Interrupted system call (os error=20 4) Dec 18 16:14:32 pbs005 proxmox-backup-proxy[39143]: GET=20 /api2/json/admin/datastore/XXXXXX/status: 400 Bad Request: [client=20 [::ffff]:42144] Unable to acquire lock=20 "/etc/proxmox-backup/.datastore.lck" - Interrupted system call (os error=20 4) Dec 18 16:14:32 pbs005 proxmox-backup-proxy[39143]: GET=20 /api2/json/admin/datastore/XXXXXX/status: 400 Bad Request: [client=20 [::ffff]:47286] Unable to acquire lock=20 "/etc/proxmox-backup/.datastore.lck" - Interrupted system call (os error=20 4) Dec 18 16:14:32 pbs005 proxmox-backup-proxy[39143]: GET=20 /api2/json/admin/datastore/XXXXXX/status: 400 Bad Request: [client=20 [::ffff:]:45994] Unable to acquire lock=20 "/etc/proxmox-backup/.datastore.lck" - Interrupted system call (os error=20 4) Which surprised me, since this is a =E2=80=99status=E2=80=99 call, which sh= ould not need=20 locking of the datastore-config. https://git.proxmox.com/?p=3Dproxmox-backup.git;a=3Dblob;f=3Dsrc/api2/admin= /datastore.rs;h=3Dc611f593624977defc49d6e4de2ab8185cfe09e9;hb=3DHEAD#l687= =20 does not lock the config, but https://git.proxmox.com/?p=3Dproxmox-backup.git;a=3Dblob;f=3Dpbs-datastore/= src/datastore.rs;h=3D0801b4bf6b25eaa6f306c7b39ae2cfe81b4782e1;hb=3DHEAD#l20= 4=20 does. So if I understand this correctly, every =E2=80=99status=E2=80=99 call (30= per second in=20 our case) locks the datastore-config exclusively. And also, every time=20 =E2=80=99status=E2=80=99 get called, the whole datastore-config gets loaded= ? Is that something that could use some performance tuning? =E2=80=94 Mark Schouten CTO, Tuxis B.V. +31 318 200208 / mark@tuxis.nl ------ Original Message ------ >From "Shannon Sterz" To "Mark Schouten" Cc "Proxmox Backup Server development discussion"=20 Date 16/12/2024 12:51:47 Subject Re: Re[2]: [pbs-devel] Authentication performance >On Mon Dec 16, 2024 at 12:23 PM CET, Mark Schouten wrote: >> Hi, >> >> > >> >would you mind sharing either `authkey.pub` or the output of the >> >following commands: >> > >> >head --lines=3D1 /etc/proxmox-backup/authkey.key >> >cat /etc/proxmox-backup/authkey.key | wc -l >> >> -----BEGIN RSA PRIVATE KEY----- >> 51 >> >> So that is indeed the legacy method. We are going to upgrade our PBS=E2= =80=99es >> on wednesday. >> >> > >> >The first should give the PEM header of the authkey whereas the second >> >provides the amount of lines that the key takes up in the file. Both >> >give an indication whether you are using the legacy RSA keys or newer >> >Ed25519 keys. The later should provide more performance, security shou= ld >> >not be affected much by this change. If the output of the commands loo= k >> >like this: >> > >> >-----BEGIN PRIVATE KEY----- >> >3 >> > >> >Then you are using the newer keys. There currently isn't a recommended >> >way to upgrade the keys. However, in theory you should be able to remo= ve >> >the old keys, re-start PBS and it should just generate keys in the new >> >format. Note that this will logout anyone that is currently >> >authenticated and they'll have to re-authenticate. >> >> Seems like a good moment to update those keys as well. > >Sure, just be aware that you have to manually delete the key before >restarting the PBS. Upgrading alone won't affect the key. Ideally you'd >test this before rolling it out, if you can > >> >In general, tokens should still be fater to authenticate so we'd >> >recommend that you try to get your users to switch to token-based >> >authentication where possible. Improving performance there is a bit >> >trickier though, as it often comes with a security trade-off (in the >> >background we use yescrypt fo the authentication there, that >> >delibaretely adds a work factor). However, we may be able to improve >> >performance a bit via caching methods or similar. >> >> Yes, that might help. I=E2=80=99m also not sure if it actually is >> authentication, or if it is the datastore-call that the PVE-environment= s >> call. As you can see in your support issue 3153557, it looks like some >> requests loop through all datastores, before responding with a limited >> set of datastores. > >I looked at that ticket and yes, that is probably unrelated to >authentication. > >> For instance (and I=E2=80=99m a complete noob wrt Rust) but if I unders= tand >>https://git.proxmox.com/?p=3Dproxmox-backup.git;a=3Dblob;f=3Dsrc/api2/adm= in/datastore.rs;h=3D11d2641b9ca2d2c92da1a85e4cb16d780368abd3;hb=3DHEAD#l131= 5 >> correcly, PBS loops through all the datastores, checks mount-status and >> config, and only starts filtering at line 1347. If I understand that >> correctly, in our case with over 1100 datastores, that might cause quit= e >> some load? > >Possible, yes, that would depend on your configuration. Are all of these >datastores defined with a backing device? Because if not, than this >should be fairly fast (as in, this should not actually touch the disks). >If they are, then yes this could be slow as each store would trigger at >least 2 stat calls afaict. > >In any case, it should be fine to move the `mount_status` check after >the `if allowed || allow_id` check from what i can tell. Not sure why >we'd need to check the mount_status for a datastore we won't include in >the resulsts anyway. Same goes for parsing the store config imo. Send a >patch for that [1]. > >[1]: https://lore.proxmox.com/pbs-devel/20241216115044.208595-1-s.sterz@pr= oxmox.com/T/#u > >> >> >> Thanks, >> >> =E2=80=94 >> Mark Schouten >> CTO, Tuxis B.V. >> +31 318 200208 / mark@tuxis.nl > > --------=_MB028D2454-D79A-484D-AE8C-58DA2711EBCE Content-Type: text/html; charset=utf-8 Content-Transfer-Encoding: quoted-printable
Hi,

We upgraded to = 3.3 yesterday, not much gain to notice with regards to the new version or= the change in keying. It=E2=80=99s still (obvioulsy) pretty busy.

However, I also tried to remove some datastores, which fa= iled with timeouts. PBS even stopped authenticating (so probably just worki= ng) all together for about 10 seconds, which was an unpleasant surprise.

So looking into that further, I noticed the follow= ing logging:
Dec 18 16:14:32 pbs005 proxmox-backup-proxy[39143]:= GET /api2/json/admin/datastore/XXXXXX/status: 400 Bad Request: [client [::f= fff]:42104] Unable to acquire lock "/etc/proxmox-backup/.datastore.lck" - I= nterrupted system call (os error 4)
Dec 18 16:14:32 pbs005 proxmo= x-backup-proxy[39143]: GET /api2/json/admin/datastore/XXXXXX/status: 400 Ba= d Request: [client [::ffff]:42144] Unable to acquire lock "/etc/proxmox-bac= kup/.datastore.lck" - Interrupted system call (os error 4)
Dec 18 = 16:14:32 pbs005 proxmox-backup-proxy[39143]: GET /api2/json/admin/datastor= e/XXXXXX/status: 400 Bad Request: [client [::ffff]:47286] Unable to acquire = lock "/etc/proxmox-backup/.datastore.lck" - Interrupted system call (os er= ror 4)
Dec 18 16:14:32 pbs005 proxmox-backup-proxy[39143]: GET /a= pi2/json/admin/datastore/XXXXXX/status: 400 Bad Request: [client [::ffff:]:= 45994] Unable to acquire lock "/etc/proxmox-backup/.datastore.lck" - Interr= upted system call (os error 4)

Which surprised m= e, since this is a =E2=80=99status=E2=80=99 call, which should not need loc= king of the datastore-config.



=
So if I understand this correctly, every =E2=80=99status=E2=80=99 call = (30 per second in our case) locks the datastore-config exclusively. And al= so, every time =E2=80=99status=E2=80=99 get called, the whole datastore-con= fig gets loaded?

Is that something that could us= e some performance tuning?

=E2=80=94=C2=A0
Mark Schouten
CTO, Tuxis B.V.
+31 318 200208 /=C2=A0mar= k@tuxis.nl


------ Original Message ------
From "Shannon Sterz" <s.ster= z@proxmox.com>
To "Mark Schouten" <mark@tuxis.nl<= /a>>
Date 16/12/2024 12:51:47
Subject Re: Re[2]: [pbs-devel] Authentication performance
<= div x-em-quote=3D"">
On Mon Dec 16, 2024 at 12:23 PM CET, Mark Schoute= n wrote:
Hi,
=C2=A0
>
>would you mind sharing either `authkey.pub`= or the output of the
>following commands:
>
>head --lines=3D1 /etc/proxmox-backup/authkey= .key
>cat /etc/proxmox-backup/authkey.key | wc -l<= /div>
=C2=A0
-----BEGIN RSA PRIVATE KEY-----
51
=C2=A0
So that is indeed the legacy method. We are goin= g to upgrade our PBS=E2=80=99es
on wednesday.
=C2=A0
>
>The first should give the PEM header of the= authkey whereas the second
>provides the amount of lines that the key ta= kes up in the file. Both
>give an indication whether you are using the = legacy RSA keys or newer
>Ed25519 keys. The later should provide more= performance, security should
>not be affected much by this change. If the= output of the commands look
>like this:
>
>-----BEGIN PRIVATE KEY-----
>3
>
>Then you are using the newer keys. There cur= rently isn't a recommended
>way to upgrade the keys. However, in theory= you should be able to remove
>the old keys, re-start PBS and it should jus= t generate keys in the new
>format. Note that this will logout anyone th= at is currently
>authenticated and they'll have to re-authent= icate.
=C2=A0
Seems like a good moment to update those keys= as well.
=C2=A0
Sure, just be aware that you have to manually del= ete the key before
restarting the PBS. Upgrading alone won't affect= the key. Ideally you'd
test this before rolling it out, if you can
=C2=A0
>In general, tokens should still be fater to= authenticate so we'd
>recommend that you try to get your users to= switch to token-based
>authentication where possible. Improving per= formance there is a bit
>trickier though, as it often comes with a se= curity trade-off (in the
>background we use yescrypt fo the authentica= tion there, that
>delibaretely adds a work factor). However, w= e may be able to improve
>performance a bit via caching methods or sim= ilar.
=C2=A0
Yes, that might help. I=E2=80=99m also not sure= if it actually is
authentication, or if it is the datastore-call t= hat the PVE-environments
call. As you can see in your support issue 31535= 57, it looks like some
requests loop through all datastores, before res= ponding with a limited
set of datastores.
=C2=A0
I looked at that ticket and yes, that is probably = unrelated to
authentication.
=C2=A0
For instance (and I=E2=80=99m a complete noob wr= t Rust) but if I understand
correcly, PBS loops through all the datastores,= checks mount-status and
config, and only starts filtering at line 1347.= If I understand that
correctly, in our case with over 1100 datastores= , that might cause quite
some load?
=C2=A0
Possible, yes, that would depend on your configur= ation. Are all of these
datastores defined with a backing device? Because = if not, than this
should be fairly fast (as in, this should not act= ually touch the disks).
If they are, then yes this could be slow as each= store would trigger at
least 2 stat calls afaict.
=C2=A0
In any case, it should be fine to move the `mount= _status` check after
the `if allowed || allow_id` check from what i ca= n tell. Not sure why
we'd need to check the mount_status for a datasto= re we won't include in
the resulsts anyway. Same goes for parsing the st= ore config imo. Send a
patch for that [1].
=C2=A0
=C2=A0
=C2=A0
=C2=A0
Thanks,
=C2=A0
=E2=80=94
Mark Schouten
CTO, Tuxis B.V.
+31 318 200208 / mark@tuxis.nl
=C2=A0
=C2=A0
--------=_MB028D2454-D79A-484D-AE8C-58DA2711EBCE-- --===============2087944167108576542== Content-Type: text/plain; charset="us-ascii" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit Content-Disposition: inline _______________________________________________ pbs-devel mailing list pbs-devel@lists.proxmox.com https://lists.proxmox.com/cgi-bin/mailman/listinfo/pbs-devel --===============2087944167108576542==--