From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from firstgate.proxmox.com (firstgate.proxmox.com [212.224.123.68]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature RSA-PSS (2048 bits) server-digest SHA256) (No client certificate requested) by lists.proxmox.com (Postfix) with ESMTPS id 7D0E77C956 for ; Mon, 18 Jul 2022 09:31:55 +0200 (CEST) Received: from firstgate.proxmox.com (localhost [127.0.0.1]) by firstgate.proxmox.com (Proxmox) with ESMTP id 704211B14C for ; Mon, 18 Jul 2022 09:31:55 +0200 (CEST) Received: from kerio.tuxis.nl (alrami.saas.tuxis.net [31.3.111.57]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature RSA-PSS (2048 bits) server-digest SHA256) (No client certificate requested) by firstgate.proxmox.com (Proxmox) with ESMTPS for ; Mon, 18 Jul 2022 09:31:54 +0200 (CEST) DKIM-Signature: v=1; a=rsa-sha256; c=simple/simple; d=tuxis.nl; s=mail; h=from:reply-to:subject:date:message-id:to:mime-version:content-type: content-transfer-encoding:in-reply-to:references; bh=BqY9IDHeh+j0u9wLOiuDEpmVWfchulnVQelx8RemXyY=; b=ZH5Jdph1J3B2tyIog+A4i/tAUH2XxDNx66mAIpq0uBnzia7OshgxO6TZQfWer8dGVSnnUCQtsJAp0 A2/1/Tbx6hqBdlAsn9kU4jJOC12wYDisAW8qaNOL/2sy0DDm+YG3QrX5Xq4P/w6yFUseHHAqIhqke9 E4KTMmcDMI125C20vOXGgQSl55MF8GmI7VrOWp3aRdAZWU+KOTBK2xC2w0tDhrNi2s9aEVVGR5wH4P kjo+vZRcnkpMtNnACQRQDN63EOoSKpF56RmjBCqrXtUqIzcgOkiy4MvqdpsDa09n6nAFgJGS5mKWLp 3WJsI0tRW7CI1Bp13jOSKHeyuzL+vcw== X-Footer: dHV4aXMubmw= Received: from [IPv6:2a03:7900:64::1001] ([2a03:7900:64::1001]) (authenticated user mark@tuxis.nl) by kerio.tuxis.nl (Kerio Connect 9.4.1 patch 1) with ESMTPSA (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256 bits)); Mon, 18 Jul 2022 09:31:47 +0200 From: "Mark Schouten" To: "Thomas Lamprecht" , "Proxmox Backup Server development discussion" Date: Mon, 18 Jul 2022 07:31:41 +0000 Message-Id: In-Reply-To: <4a05245f-6bd1-02da-f600-9976c4f1abf5@proxmox.com> References: <3283c4b8-7df3-8cd8-2886-b6d14ee3633b@proxmox.com> <4a05245f-6bd1-02da-f600-9976c4f1abf5@proxmox.com> Reply-To: "Mark Schouten" User-Agent: eM_Client/9.0.1755.0 MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8; format=flowed Content-Transfer-Encoding: quoted-printable X-SPAM-LEVEL: Spam detection results: 0 AWL 0.020 Adjusted score from AWL reputation of From: address BAYES_00 -1.9 Bayes spam probability is 0 to 1% DKIM_SIGNED 0.1 Message has a DKIM or DK signature, not necessarily valid DKIM_VALID -0.1 Message has at least one valid DKIM or DK signature DKIM_VALID_AU -0.1 Message has a valid DKIM or DK signature from author's domain DKIM_VALID_EF -0.1 Message has a valid DKIM or DK signature from envelope-from domain SPF_HELO_NONE 0.001 SPF: HELO does not publish an SPF Record SPF_PASS -0.001 SPF: sender matches SPF record Subject: Re: [pbs-devel] Scheduler causing connectivity issues? X-BeenThere: pbs-devel@lists.proxmox.com X-Mailman-Version: 2.1.29 Precedence: list List-Id: Proxmox Backup Server development discussion List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Mon, 18 Jul 2022 07:31:55 -0000 Hi, >You have 30% of runnable process getting stalled due waiting for IO, that >naturally should not cause the request accept future to get starved but is >the reason for why it happened with the current (or better old) >architecture. Increasing available memory, so that the page cache can hold >more entries, could already relieve that system a bit. Thanks. Please note that /var/lib/proxmox is on a different set of disks=20 than the datastores. Root pool is on two PM883=E2=80=99s, datastore is lots = of=20 spinning disks with nvme-special devices. Not sure if that=E2=80=99s releva= nt in=20 your findings, but here you have it :) Memory upgrade is somewhere on our roadmap. >We improved on the reproducer we got locally by simulating a higher latenc= y >disk using dm-delay on a small single core VM. > >For one we made the libpve-storage-perl do more efficient list-snapshot >requests if they can be filtered by VMID, and on the PBS side we moved mos= t >operations that cause IO (and are related to backup groups/snapshots) to a >separate thread pool so that the main thread should be less >congested/blocked. Given the other responses in this thread, I=E2=80=99m not going to upgrade= yet=20 to a testing-version in production. Please let me know if there is any=20 other info you need from me. =E2=80=94 Mark Schouten, CTO Tuxis B.V. mark@tuxis.nl